Introduction

Delirium, a common neuropsychiatric problem among patients with advanced cancer1, can result in extended hospital stays, higher mortality and morbidity rates, increased healthcare costs, and considerable distress for both patients and their family members, as well as healthcare providers2,3. Among patients with advanced cancer admitted to the acute palliative care unit (APCU), delirium can affect 42–88% of individuals4. However, few comprehensive studies have thoroughly examined its prevalence and potential risk factors5. Although effective preventive interventions for delirium in hospital settings are currently lacking, physicians and healthcare providers can alleviate modifiable risk factors within the APCU by providing exercise programs and family support to reduce the occurrence of delirium6. Therefore, early recognition and prevention are essential in patients with risk factors for developing delirium7.

To date, nurse-administered questionnaires have mainly been used to predict the risk of delirium in hospitalized patients8. However, physicians may find it challenging to conduct daily assessments through questionnaires. Machine learning models have recently been introduced9,10,11,12,13,14. Machine learning models were previously used to predict delirium among patients after surgery for degenerative spinal disease10, patients admitted to the intensive care unit11, hospitalized patients without cognitive impairment12, patients admitted to the general ward13, and older patients after general surgery14. Furthermore, previous study on predicting delirium was also conducted in patients with advanced cancer receiving pharmacological interventions through machine learning models. However, this study was limited to patients taking antipsychotic medications or trazodone, and no operational criteria for determining the precipitating factors of delirium9. The area under the receiver operating characteristic curve (AUROC) for these studies ranged from 0.666 to 0.964.

The machine learning model for predicting delirium in patients with advanced cancer has been explored, with suggested advantages9. However, this study only considered the decision-tree model, which is largely unstable because a small change in the data can result in a major change in the structure of the model. Therefore, a comprehensive study using machine learning models is needed to more accurately assess the features of delirium in patients with advanced cancer admitted to the APCU. We aimed to develop and compare a variety of machine learning models to predict delirium in patients with advanced cancer admitted to the APCU and investigate the significant features that influenced the machine learning model.

Materials and methods

Data source and study population

Our study utilized a multicenter, patient-based registry cohort collected from four hospitals in South Korea: Seoul National University Bundang Hospital, Yonsei University Severance Hospital, CHA University Bundang Medical Center, and Seoul National University Hospital. We identified potential participants as patients with advanced cancer admitted to the APCU at four centers between January 1, 2019, and December 31, 2020. Of the 2328 patients who met the eligibility criteria: (1) aged 20 years or older; (2) diagnosed with advanced solid cancer; and (3) admitted to the APCU. We excluded five patients with a hospital stay exceeding 3 months, six patients transferred to other departments, and three patients with terminal delirium, defined as delirium that occurred within 2 weeks of death. Our final sample consisted of 2314 patients with advanced cancer who were admitted to the APCU and who met all eligibility criteria15.

The study protocol received approval from the Institutional Review Boards of each center (CHA University, CHAMC 2021-03-054-002; Seoul National University, H-2103-028-1201; Seoul National University Bundang Hospital, B-2104/681-405; and Yonsei University, 4-2021-0323). The requirement for informed consent was waived by the Institutional Review Board of each center (CHA University; Seoul National University; Seoul National University Bundang Hospital; and Yonsei University) because only anonymized data were examined. The researchers of this study confirm that all methods were performed in accordance with the relevant guidelines and regulations. Especially, this research followed the guidelines outlined in the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement (Table S1).

Variables for machine learning

A total of 39 variables were used in this study, and the justification of the selection was selected based on several previous studies predicting delirium and the available variables in the APCU16,17,18. Based on these results, we proceeded with the establishment of a national registry, excluding the use of data for which construction was deemed infeasible. Additionally19, within the National Registry Project. The dataset included general information20,21 such as age, sex, chemotherapy during hospitalization, living situation, medical aid recipients, education level, use of glasses or hearing aids, and history of alcohol consumption and smoking. Clinical risk factors such as obesity, blood pressure, and body temperature, various laboratory results like blood tests and C-reactive protein levels, and a history of diseases including delirium, cardiovascular disease, diabetes mellitus, respiratory disease, liver disease, mental illness, and head injury were also collected. We aimed to ascertain the onset of delirium in patients with advanced cancer immediately upon APCU admission, hence all baseline datasets consist of data obtained at the time of admission to the APCU.

To identify delirium, we reviewed medical records based on the criteria outlined in the Fifth Edition of the Diagnostic and Statistical Manual of Mental Disorders. A well-trained physician and an academic nurse conducted this detailed review. Based on previous validation study, we did not use the code from the 10th revision of the International Classification of Diseases because it was deemed unreliable with low sensitivity22. Instead, we recorded all potential symptoms, signs, and associated medications and had at least two specialists (BDK and YJK) review each case. In case of any disagreement between the specialists, an additional specialist (SHY) was consulted to make the final decision.

The primary objective of this study was to predict the occurrence of delirium in patients with advanced cancer admitted to the APCU using machine learning models. To achieve this, the data were split into a training-to-testing ratio of 80:20, with the training set comprising 1851 (80%) patients and the testing set comprising 463 (20%) patients. Feature normalization was performed by initially computing the mean and standard deviation of each feature within the training set. Subsequently, this normalization procedure was applied to both the training and testing datasets, to ensure that the mean values were centered at zero and the standard deviations were scaled to one. The proposed machine learning models underwent validated through a stratified fivefold cross-validation process on the training data, followed by further validation using independent testing data23,24,25,26,27.

Machine learning models and evaluation metrics

We evaluated seven machine learning algorithms for predicting delirium in patients with advanced cancer: extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), gradient boosting (GBM), light gradient boosting (LGBM), logistic regression (LR), support vector machine (SVM), and random forest (RF). For these seven machine learning algorithms, which were optimized by input parameters and hyperparameters, we applied an exhaustive search, which used to brute force through all possible combinations of a set of the hyperparameter combination yielding the best performance, with fivefold cross validation for each model to identify the most optimal hyperparameters. To estimate the uncertainty and variability of our results, we calculated the AUROC, sensitivity, specificity, accuracy, and balanced accuracy scores during the fivefold cross-validation process. These metrics were calculated by the following formulas with values of true positive (TP), true negative (TN), false positive (FP), false negative (FN) for binary classification:

$$Sensitivity= TPR=\frac{TP}{TP+FN}$$
$$Specificity=1-FPR=\frac{TN}{TP+FP}$$
$$Balanced\,\, accuracy=\frac{Sensitivity+Specificity}{2}$$
$$AUROC={\int }_{0}^{1}TPR\left({FPR}^{-1}\left(x\right)\right)dx$$

We adopted AUROC, which is commonly used in binary classification and is not sensitive to class imbalances representing the relationship between the true positive rate (TPR) and the false positive rate (FPR) as the threshold changes, as the evaluation metric for measuring the overall performance of the model.

To further enhance the performance of the machine learning model, we employed an ensemble approach. This technique combines multiple models to improve prediction accuracy and robustness. We created various groups of models by combining all possible model combinations and evaluated their performances to determine the best combination. This approach leveraged the strengths of each individual model while mitigating any weaknesses or limitations.

For each of the best performing machine learning models, we investigated the feature importance, which is a measure of how influential a feature was in splitting a class when branching a node in a tree-based model.

We utilized several popular software tools, including Python 3.9.7 (Python Software Foundation, Wilmington, DE, USA), TensorFlow-gpu 2.6.0, Keras 2.6.0, NumPy 1.21.5, Pandas 1.4.1, Matplotlib 3.5.1, and Scikit-learn 1.0.2, to implement the machine learning models28,29,30.

Machine learning-driven public website development

We also deployed our machine learning model on a public website (http://ai-wm.khu.ac.kr/Delirium/), enabling the prediction of delirium when provided with information from 39 patients. Upon accessing the website, users enter patient information, which is encoded on the website server, allowing for an immediate delirium prediction result. No private information beyond the selected 39 pieces of data needed to be entered, and all entered information was promptly deleted once the prediction result was obtained, ensuring no risk of information exposure.

Informal consent

The institutional review board of the four centers approved this study and waived the requirement for informed consent because only anonymized data were examined.

Ethics statement

The protocol was approved by the institutional review boards of the four centers (CHA University, CHAMC 2021-03-054-002; Seoul National University, H-2103-028-1201; Seoul National University Bundang Hospital, B-2104/681-405; and Yonsei University, 4-2021-0323).

Results

This study was utilized a multicenter patient-based registry cohort collected from four hospitals in South Korea to develop and investigate the machine learning model for predicting delirium in patients with advanced cancer. Table 1 displays the baseline characteristics of the study population. In the original cohort, 165 (7.1%) patients experienced delirium.

Table 1 Included variables for an artificial intelligence model and patient information (total n = 2314).

Table 2 summarizes the fivefold cross validation accuracy comparison of each model and the ensemble machine learning model using the accuracy metrics of sensitivity, specificity, balanced accuracy, and AUROC. In terms of balanced accuracy and AUROC, the three models—RF, XGBoost, and LGB—demonstrated the highest performance compared with the other single models. To further improve classification performance, we adopted an ensemble approach using three single models with higher performance: RF, XGBoost, and LGB. The results revealed that the combination of XGBoost and RF provided the most optimal performance, achieving the following accuracy metrics: 68.83% sensitivity, 70.85% specificity, 69.84% balanced accuracy, and 74.55% AUROC. Subsequently, we performed feature importance analysis using an ensemble model that combines XGBoost and RF. We averaged and normalized the values of feature importance from the two models and ranked each feature. Figure 1 presents the normalized values of ranked feature importance from all 39 features used to predict delirium in patients with advanced cancer. The results indicated that sex (1.00) had the highest importance value and was the primary contributor to predicting delirium, followed by a history of delirium (0.82), chemotherapy during hospitalization (0.81), smoking status (0.73), alcohol consumption (0.67), living with family (0.49), and age (0.47).

Table 2 Five-fold cross validation result comparison according to machine learning models.
Figure 1
figure 1

Ranked feature importance values for all 39 features. WBC white blood cell count, PLT platelets, AST aspartate transaminase, ALT alanine transaminase, BUN blood urea nitrogen.

We validated the performance of the machine learning models using an isolated testing dataset. Table 3 summarizes the delirium prediction results of the test dataset. The results also showed that the combination of XGBoost and RF provided the most optimal performance with the following accuracy metrics: 75.76% sensitivity, 52.63% specificity, 64.19% balanced accuracy, and 73.11% AUROC. Compared with the fivefold cross validation results, the accuracy metrics of balanced accuracy and AUROC were similar to the testing data results, indicating minimal overfitting or underfitting in the model.

Table 3 Delirium prediction results from the testing dataset.

Furthermore, we deployed our artificial intelligence (AI) on a public website (http://ai-wm.khu.ac.kr/Delirium/) to allow public access to the delirium prediction results in patients with advanced cancer. Figure 2 displays the website of the deployed AI model. Figure 2a illustrates the user web interface for entering information, where users inputs 39-feature data such as sex, age, chemotherapy during hospitalization, living with family, medical aid recipients, and education levels. Upon entering the information into the web application, users can immediately obtain the delirium prediction results, as shown in Fig. 2b. The prediction results include the probability of mortality.

Figure 2
figure 2

Deployed web application predicting delirium: (a) user input, (b) prediction results with delirium probability in patients with advanced cancer.

Discussion

Key findings

The results suggest that machine learning models can predict delirium in patients with advanced cancer admitted to the APCU with relatively high accuracy. The combination model of XGBoost and RF demonstrated the best performance for predicting delirium in these patients, achieving a balanced accuracy of 69.84% and an AUROC of 74.55%. This performance was validated through both k-fold cross-validation and testing on an isolated dataset. Notably, sex emerged as the most critical feature for predicting delirium in patients with advanced cancer, followed by a history of delirium, chemotherapy during hospitalization, smoking status, alcohol consumption, living with family, and advanced age. To the best of our knowledge, this study represents the first attempt to use the machine learning model to predict delirium in South Korean patients with advanced cancer. These findings underscore the importance of delirium screening in APCU-admitted patients with advanced cancer and contribute to identifying the most significant risk factors for this patient group.

Comparison of previous studies

Our results, particularly in the combination model of XGBoost and RF, corroborate previously reported risk factors associated with delirium. Earlier research indicated that advanced age, a history of delirium, smoking status, alcohol consumption, and sex were associated with delirium in patients with advanced cancer admitted to the APCU31,32,33,34. Male sex was identified as a significant risk factor for neuropsychiatric disorders, potentially due to the protective role of estrogen in individuals with potential cognitive impairments35,36. Males may exhibit more pronounced neuropsychiatric disorders under acute stress, driven by different corticotropin-releasing factor signaling pathways compared with females37. Consistent with prior studies, our findings highlight old age as a significant risk factor for delirium in patients with advanced cancer38,39,40, with possible contributing factors being atherosclerosis and malnutrition common in older patients40,41,42. The association of cigarette smoking with delirium is attributed to nicotine withdrawal during hospitalization1. Smokers have been noted to display more severe agitation, characteristic of hyperactive delirium43. Changes in various neurotransmitter systems, including dopamine, opioids, and cholinergic systems, have been implicated in shared hyperactive delirium44. The relationship between chemotherapeutic agents and delirium remains controversial and inconsistent, as reported in single case reports or studies with small populations. Previous studies have suggested that patients who undergo multiple chemotherapy regimens could experience delirium, which may occur in approximately one in 11 adults receiving chemotherapy45,46. Chemotherapeutic agents may penetrate the blood–brain barrier, potentially serving as a risk factor for delirium47,48. Similar to our study, a previous study was conducted to predict delirium in patients with advanced cancer receiving pharmacological intervention through a visually interpretable prediction model9. This study has the advantage of being easy to use with small number of variables, but it is dependent on Delirium Rating Scale Revised-98 and has a limitation in predicting delirium within three days. On the other hand, our study provided a web application with public access with a machine learning model, and could serve as a medical aid for healthcare providers to monitor the delirium in the patients with advanced cancer.

Strengths and limitations

The primary strength of this study lies in the relatively high accuracy of the machine learning model for detecting delirium in patients with advanced cancer, as validated by testing datasets. Consistently high AUC values in both the training and testing datasets indicate that the combination model of XGBoost and RF is capable of predicting delirium in patients with advanced cancer. Important predictors of delirium include sex, history of delirium, chemotherapy during hospitalization, smoking status, alcohol consumption, living with family, and advanced age. The dataset was collected from four academic cancer centers, involving oncology-trained physicians and healthcare providers, providing a comprehensive view of risk factors associated with delirium in patients with advanced cancer and potentially aiding in the development of effective preventive interventions.

However, this study had several limitations. Firstly, he datasets were collected from patients admitted to four hospitals and were heterogeneous, potentially limiting the generalizability of the model to the general population. Secondly, delirium assessment tools, diagnostic criteria, observation frequency, and timeframes may differ from those used in clinical trials. Thirdly, machine learning models often benefit from larger datasets, but the sample size of this study was limited. Fourthly, our proposed machine learning model underperformed compared to previous studies predicting delirium across varying patient conditions49,50. Given the limitations of our registry construction project, we did not collect data at various time points. Additional research may be necessary to address this gap. Fifthly, dataset of this study lacks information pertaining to delirium-related medications or disease history. However, we have initiated the establishment of a new prospective cohort to supplement the inadequate input data values. Consequently, we plan to conduct further research to develop more sophisticated machine learning modeling through subsequent studies. Finally, due to the retrospective design of our registry for patients with advanced cancer, it was not feasible to distinguish between different types of delirium (hyperactivity, hypoactivity, and mixed type). We are fully aware of this limitation, and currently, in our newly established prospective cohort, we are making efforts to differentiate between them. To apply machine learning models and achieve external validation, a larger sample size dataset is required. Lastly, an imbalance in the number of patients in each group may limit the performance of the models51,52.

Clinical and policy implications

To the best of our knowledge, this study represents the first creation of a machine learning model for predicting delirium in patients with advanced cancer admitted to the APCU. The use of this machine learning model for delirium prediction in APCU-admitted patients with advanced cancer can significantly improve patient quality of life and reduce physician workload. Especially for Korean healthcare providers with less educational experience in delirium53, the machine learning-based delirium prediction model of patients with advanced cancer could be part of a medical aid. Delirium episodes are particularly common in patients with advanced cancer in the APCU, with prevalence increasing as the terminal phase of the illness approaches. However, delirium in these patients has been inadequately identified and managed. Our model has the potential to profoundly impact risk assessment, early detection, and effective interventions for delirium in patients with advanced cancer.

Conclusion

Using a large-scale multicenter patient-based registry cohort, we have successfully developed the machine learning prediction model for delirium in South Korean patients with advanced cancer. Our study revealed that the combination of XGBoost and RF delivered the most optimal performance, a conclusion validated by the results of both k-fold cross-validation and the isolated testing dataset. Additionally, we identified sex was the primary predictor of delirium, followed by history of delirium, chemotherapy, smoking status, alcohol consumption, and living with family. Furthermore, we have made our AI accessible to the public through a dedicated website (http://ai-wm.khu.ac.kr/Delirium/) to provide delirium prediction results for patients with advanced cancer. Although external validation using prospectively collected data may be necessary to further refine and validate the model, we have implemented a web application to gather additional data. Notably, the application does not store any user-entered information at present. However, we have plans to securely store the user-entered information with their consent, facilitating a real-time learning process to enhance the machine learning model.