Abstract
The occurrence of liver injury during cancer treatment is extremely harmful. The risk factors for drug.induced liver injury (DILI) in the pancreatic cancer population have not been investigated. This study aims to develop and validate an interpretable decision tree (DT) model for the early prediction of DILI in pancreatic cancer patients using multitemporal clinical data and screening for related risk factors. A retrospective collection of data was conducted on 307 patients, the training set (n = 215) was used to develop the model, and the test set (n = 92) was used to evaluate the model. The classification and regression trees algorithm was employed to establish the DT model. The Shapley Additive explanations (SHAP) method was used to facilitate clinical interpretation. Model performance was assessed using AUC and the Hosmer‒Lemeshow test. The DT model exhibited superior diagnostic efficacy, the AUC values were 0.995 and 0.994 in the training and test sets, respectively. Four risk factors associated with DILI occurrence were identified: delta.albumin, delta.ALT, and post (AST: ALT), and post.GGT. The multiperiod liver function indicator.based interpretable DT model predicted DILI occurrence in the pancreatic cancer population and contributes to personalized clinical management of pancreatic cancer patients.
Similar content being viewed by others
Introduction
Chemotherapy can extend patient survival as one of the primary treatments for malignant tumors1. However, these treatments often come with drug toxicity and severe side effects affecting the postoperative outcome and organ function2. Drug.induced liver injury (DILI) emerges as a common cause of liver impairment. Liver injury due to antitumor drugs accounts for a substantial fraction of DILI cases and is one of the factors preventing patients from completing an entire course of chemotherapy3,4. Neoadjuvant chemotherapy offers advantages such as improved resection rates and reduced postoperative recurrence. They play a key role in treating resectable and borderline resectable pancreatic cancer5,6. However, the occurrence of liver injury during neoadjuvant chemotherapy poses a significant challenge, impeding patient treatment transformation and affecting postoperative survival7,8. Despite these puzzles, the current clinical investigation into DILI within the pancreatic cancer population remains sparse.
The onset of DILI is unpredictable, with diverse clinical manifestations9. According to the European Society of Liver Disease guidelines, Serologic parameters such as alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin (TBIL), and albumin (ALB) level abnormalities are used to diagnose acute DILI. While monitoring liver function tests is debated, regular liver function assessments during chemotherapy are still advised for oncology patients10,11. However, accurate and effective laboratory tests for predicting DILI are currently lacking, especially in pancreatic cancer patients. Liver biopsy is more accurate but is invasive and not recommended for early or conventional diagnosis of DILI12. Exploration of the risk factors for the occurrence of liver injury during chemotherapy is still very limited.
The development of machine learning applications has enhanced modern medicine’s capabilities to predict and diagnose diseases13,14. The SHAP interpretable approach further enhances the clinical applicability of the predictive model, could promote clinical application and understanding, and is a practical tool for assisting machine learning in predicting and diagnosing diseases15. The decision tree algorithm, a machine learning classification approach, can adaptively process large data volumes in short spans, accommodating diverse variables16. The improved limitations of traditional algorithms in complex data processing compensate for variable collinearity challenges17. The liver injury decision tree prediction flow chart constructed by Y18 et al. can be used to compare the incidence of liver injury between different drugs and has the potential to be a convenient tool to help medical staff in predrug assessment.
Therefore, the purpose of this study is to develop a decision tree prediction model based on multiple time.phase laboratory indicators. Its goal is the early detection of liver injury, identifying independent risk factors associated with DILI in pancreatic cancer patients, striving to extend neoadjuvant chemotherapy’s benefits to a broader pancreatic cancer cohort, and offering clinical guidance for diagnosis and treatment decisions.
Materials and methods
1) Patient screening
This study followed the Declaration of Helsinki and was approved by the Ethics Committee of Hospital. A total of 1,406 patients diagnosed with pancreatic cancer were retrospectively enrolled from January 2016 to January 2023 (Fig. 1). Inclusion: (1) Informed consent to participate; (2) Received pancreatic cancer.related anticancer therapy; (3) Receipt of pre. and posttreatment liver biochemistry assessments. Exclusion: (1) concurrent bile duct cancer or metastatic hepatocellular carcinoma; (2) viral, alcoholic, fatty, or other liver injury; (3) presence of missing or incomplete clinical data. According to the 2019 Clinical Practice Guidelines for Drug.induced Liver Injury of the European Society for Liver Research10, One of the following conditions is required to diagnose DILI: (1) ALT ≥ 5 × upper limit of normal (ULN); (2) ALP ≥ 2 × ULN, especially with elevated glutamyltransferase (GGT) (to exclude bone disease); (3) ALT ≥ 3 × ULN and TBil > 2 × ULN. Finally, 133 patients met the diagnostic criteria for DILI, and 174 patients displayed no abnormality in liver function in the course of chemotherapy. All patients were randomized 7:3 into training and test sets. The training set (n = 215) was used to develop the model, and the test set (n = 92) was used to evaluate the model (illustrated in Figs. 1 and 2).
Data collection
Clinical data from all patients were retrospectively analyzed by the picture archiving and communication system (PACS). Included sex, age, therapy duration, BMI, presence of diabetes and hypertension, tumor (T) stage, nodal (N) stage, lymphatic metastasis (LN), degree of pathological differentiation, and pre. and posttreatment liver function.related laboratory markers, including ALT, AST, TBIL, and ALB, among others. Furthermore, employing the liver function indicators, we calculated delta features, defined as the changes in feature values between distinct phases. The delta value was computed as the difference between posttreatment and pretreatment values divided by the values before treatment. To ensure data integrity and accuracy, two clinicians, A and B, with 5 to 10 years of experience independently evaluated all data (Set A and Set B). A consensus was achieved through a joint assessment in cases involving quantitative parameters.
Statistical analysis
Statistical analyses were performed using SPSS (version 27.0), R (version 4.1), and Python (version 3.7). Associations were assessed through both univariate and multivariate analyses. Categorical variables were evaluated using the chi.squared test, while continuous variables were assessed via the nonparametric Mann‒Whitney test. The Shapiro.Wilk test was employed to ascertain normality, and variables that deviated from a normal distribution were represented as the median (interquartile range). A significance threshold of P < 0.05 was considered statistically significant. Spearman’s rank correlation test was employed to compute the correlation coefficient (ICC) for each feature in both Set A and Set B. Features exhibiting an ICC surpassing 0.8 were deemed adequate and subsequently included in subsequent investigations.
Model construction and evaluation
All factors in the training cohort were analyzed first using univariate logistic regression analyses followed by multivariate logistic regression analysis. Factors exhibiting a p value of less than 0.05 were identified as independent predictors. Subsequently, pre.treatment, post.treatment, and Delta clinical models were established. A visualization nomogram was crafted to enhance clinical applicability, yielding risk values for DILI in each case.
Furthermore,, backward stepwise logistic regression was used to identify independent predictors for facilitating the construction of the decision tree model, highlighting potential nonlinear associations between variables and outcomes. The predictive efficacy of the established model was scrutinized via receiver operating characteristic (ROC) analysis. Calibration performance was evaluated through nomogram calibration curves, while model fit was assessed using the comprehensive Hosmer.Lemeshow test. The Delong test was employed to discern differences among individual models. Finally, to ensure the clinical relevance of our findings, the SHAP method was employed to interpret the decision tree model and identify pertinent risk factors.
Ethics approval and consent to participate
This retrospective study was approved by the Medical Ethics Committee of Zhejiang Provincial People’s Hospital (NO.KT2022053) and followed the Declaration of Helsinki. The need for informed consent was waived for this retrospective study by the Medical Ethics Committee of Zhejiang Provincial People’s Hospital (NO. KT2022053), because of the retrospective nature of the study. All experiments were performed according to relevant guidelines and regulations.
Results
1) Clinical characteristics of patients
In the univariate analysis, chemotherapy cycle, hypertension, pre.Albumin, post.ALP, Delta.Platelet and others demonstrated significant differences between patients with and without DILI (P < 0.05 statistics are shown in Table 1, and complete data in Supplementary Table 1.). Within both the training and test sets, except the N stage and pre. AST, none of the features exhibited statistically significant discrepancies (Table 2, complete data in Supplementary Table 2.).
The types and proportions of chemotherapy regimens are shown in Fig. 3. The main treatment regimen for the patients in this study was FOLFIRINOX (Fig. 3).
2) Model construction and evaluation
Following multivariate logistic analysis, prealbumin, pre.ALP, and pre.TBIL were identified as independent predictors of DILI incorporated into the pretreatment clinical models (Supplementary Table 3, 4, 5). The training set AUC value was 0.752. Post.alanine, post.ALT, post (AST: ALT), and post.GGT was incorporated into the post.treatment clinical models, yielding AUC values of 0.998 in the training set. The delta model featured delta.albumin and delta.ALT, achieving AUC values of 0.998 on the training set. The Hosmer Lemeshow test of the visual nomogram showed consistency with the ideal curves (Figs. 4, 5, 6, 7).
Upon employing a gradual logistic regression approach, delta.albumin, delta.ALT, post (AST: ALT), and post.GGT were identified as independent predictors of DILI, leading to the construction of the decision tree models. The model’s AUC values were 0.995 and 0.994 on the training and test sets, respectively (Fig. 8, Table 3). The Delong test demonstrated that, except for the preclinical model, the differences among the other three models were not statistically significant (P > 0.05) (Table 4). Furthermore, the SHAP algorithm provided insights into individual variable contributions within the decision tree model. In the SHAP heatmap and feature plot, the importance of the features is highlighted, and discernible trends illuminate the predictive impact of variable shifts (Figs. 9, 10). Notably, post.GGT, Delta.ALT, and Delta.Albumin exerted positive influences, contributing to predictions for DILI. The individual force plot accentuated the substantial predictive contributions of delta.albumin, post.T, and post.AST:ALT, particularly for pancreatic cancer patients with liver injury. The red arrows indicate the most positive impact factor on pancreatic cancer patients’ hepatic injury (Fig. 11).
Discussion
Neoadjuvant therapy plays a key role in the transformation treatment of patients with pancreatic cancer19. However, the characteristics associated with liver injury in pancreatic cancer patients remain unclear. Currently, the diagnosis of liver injury is an exclusion diagnosis 20,21. To explore the best independent predictors, this study collected dual temporal phase clinical data of pancreatic cancer patients before and after receiving antitumor therapy. We also calculated the amount of variation between the data. The obtained delta feature values can reflect the dynamic changes in liver function indicators, making the model data more comprehensive22. This study used a decision tree (DT) approach to explore independent variables that could identify the risk of DILI in pancreatic cancer patients. We obtained 4 risk factors associated with their occurrence, including delta albumin and delta. ALT, Post.GGT, post (AST: ALT). The DT prediction model constructed based on these four factors performed well, and the AUC values for the training and test sets were 0.995 and 0.994, respectively. The use of SHAPL technology further intuitively reflects the influence of these risk factors in patients with pancreatic cancer and reinforces the model’s interpretability and clinical utility.
As a machine.learning model, the decision tree can consider the interactions between the variables, and independent predictive factors with the optimal cutoff point were selected23,24. In this study, we used the Gini index to segment the root nodes and predict the results hierarchically. It can represent the frequency with which the features are misidentified. The smaller the Gini index is, the higher the purity, which means the better the selected node index25,26. In the DT model, Delta. Albumin was selected as the initial node in pancreatic cancer patients with Delta. Albumin is less than delta. Patients with ALT less than 1.695 had a higher risk of DILI. This result is also clearly demonstrated in the individual force maps by SHAP, also at probably = 0.065, in patients with DILI, Delta. Albumin occupies the longest positively correlated arrow, indicating that it has the highest contribution to the occurrence of liver injury in pancreatic cancer patients. We infer that this result may also be related to the tumor properties of pancreatic cancer. As a malignant and aggressive tumor, the occurrence and development of pancreatic cancer can cause an amino acid imbalance in the body, not only changing albumin levels but also affecting the ability of the liver to synthesize proteins, including albumin, further aggravating liver disease27,28. Changes in albumin have some effects in various liver diseases, in which dysfunction of albumin occurs earlier than other factors and may be a novel diagnostic biomarker for early liver function impairment29. Pancreatic cancer has a remarkable predilection for the liver as a site of secondary tumor formation30. Therefore, the dynamic testing of albumin levels during diagnosis and treatment has some meaning for pancreatic cancer patients prone to liver disease and even liver metastases.
In addition, in the DT model, Delta. ALT, Post.GGT, post (AST: ALT) has a larger gini reduction amplitude and can correctly identify the classification of the outcome of liver injury. In the heatmap plot, Delta. ALT and Post.GGT are factors that have important influences. ALT can be used to determine the severity of hepatocyte damage31. When hepatocytes are damaged, blood ALT levels are elevated, but in the absence of severe damage to hepatocytes, ALT can present at a normal level or be only mildly elevated. The disease will continue to develop insidiously, which is extremely disadvantageous for patients32. A prospective study showed that using an ALT > 3 ULN as a diagnostic strategy identified a greater proportion of hospitalized patients with difficult DILI diagnoses33. We have reason to believe that the changes in ALT may be a sensitive indicator for judging liver injury. Although some patients did not reach the diagnostic threshold but probably developed occult DILI, dynamic monitoring of these routine indicators is essential. Similar to the results of S34et al., As an important indicator reflecting liver function, DILI should be considered, particularly in cases with a marked increase in GGT even if conventional DILI threshold levels are not reached. This is in agreement with our findings.
In this study, the models centered around posttreatment and Delta features demonstrated striking predictive efficacy and exhibited AUC values of 0.998 in the training set and 0.994 and 0.985 in the test set, respectively. This underscores the necessity of periodic liver function evaluations postantitumor therapy. In addition, excluding the pretreatment model, all models achieved an AUC exceeding 0.9 for both the training and test groups while effectively averting overfitting concerns. This heightened predictive power may be related to the application of the decision tree model and the pivotal liver function indices chosen to gauge hepatic injury in our study. In the presence of standard indicators defining liver injury and liver dysfunction, this result echoes the expectations set forth by our research. In the MLP model that predicted bilirubin or creatinine was increased in patients with chemotherapy by Pinkie35 et al. The training set had almost perfect sensitivity and specificity that reached above 0.95, and AUC = 0.99 (95% CI 0.98–1.00) for creatinine and 0.97 (95% CI: 0.95–0.99) for bilirubin. In the study conducted by Rodolphe36 et al., the decision tree model constructed successfully classified all 3 tumor types with AUCs of 0.98, 0.98 and 1.00. The predicted probabilities of the terminal nodes of the decision tree also reached 0.907 to 0.989. In contrast, the predictive effect of the pretreatment model was general, which may be attributed to the lack of significant changes in liver function indicators.
Several limitations are considered in this study. First, its retrospective nature and relatively small sample size introduce the potential for selection bias. To enhance the model’s robustness and applicability, future studies could encompass prospective investigations and validation across multiple medical centers. Second, the stability of the decision tree model needs to be further strengthened, applying ensemble algorithms and radiomics can be used to construct the predictive model in future research37. Last, the scope of this study’s data remains limited, with numerous clinical and imaging indicators left unexplored. unearthing more comprehensive data pertinent to liver injury underscores a promising avenue for future exploration.
In summary, this study explored the DILI occurrence factors associated with the pancreatic cancer population through the SHAP interpretable decision tree model, constructed a personalized prediction tool, and offered the possibility of timely intervention for patients with liver injury to reduce. It may help optimize clinical treatment strategies as well as improve patient outcomes.
Data availability
The data that support the findings of this study are available from the authors, but some restrictions apply to protect patient confidentiality.
References
Shubert, C. R. et al. Overall survival is increased among stage III pancreatic adenocarcinoma patients receiving neoadjuvant chemotherapy compared to surgery first and adjuvant chemotherapy: An intention to treat analysis of the national cancer database. Surgery 160, 1080 (2016).
Duwe, G. et al. Hepatotoxicity following systemic therapy for colorectal liver metastases and the impact of chemotherapy-associated liver injury on outcomes after curative liver resection. Ejso-Eur. J. Surg. Onc. 43, 1668 (2017).
Shen, T. et al. Incidence and etiology of drug-induced liver injury in Mainland China. Gastroenterology 156, 2230 (2019).
Floyd, J., Mirza, I., Sachs, B. & Perry, M. C. Hepatotoxicity of chemotherapy. Semin. Oncol. 33, 50 (2006).
Wang, C. et al. Neoadjuvant therapy for pancreatic ductal adenocarcinoma: Where do we go?. Front. Oncol. 12, 828223 (2022).
Scheufele, F., Hartmann, D. & Friess, H. Treatment of pancreatic cancer-neoadjuvant treatment in borderline resectable/locally advanced pancreatic cancer. Transl. Gastroent. Hep. 4, 32 (2019).
Klaiber, U. & Hackert, T. Conversion surgery for pancreatic cancer-the impact of neoadjuvant treatment. Front. Oncol. 9, 1501 (2019).
Gangi, A. & Lu, S. C. Chemotherapy-associated liver injury in colorectal cancer. Ther. Adv. Gastroenter. 13, 320854238 (2020).
Yu, Y. C. et al. CSH guidelines for the diagnosis and treatment of drug-induced liver injury. Hepatol. Int. 11, 221 (2017).
Andrade, R. J. et al. EASL clinical practice guidelines: drug-induced liver injury. J. Hepatol. 70, 1222 (2019).
Senior, J. R. Monitoring for hepatotoxicity: What is the predictive value of liver “function” tests?. Clin. Pharmacol. Ther. 85, 331 (2009).
Teschke, R. & Frenzel, C. Drug induced liver injury: do we still need a routine liver biopsy for diagnosis today?. Ann. Hepatol. 13, 121 (2013).
Skrdla, P. J., Coscia, B. J., Gavartin, J., Browning, A. & Shelley, J. Drug aggregation of sparingly-soluble ionizable drugs: molecular dynamics simulations of papaverine and prostaglandin F2α. Mol. Pharm. 20(10), 5135–5147 (2023).
Zhou, L. Q. et al. Artificial intelligence in medical imaging of the liver. World J. Gastroentero. 25, 672 (2019).
Dickinson, Q. & Meyer, J. G. Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences. PLOS Comput. Biol. 18, e1009736 (2022).
Hammann, F. & Drewe, J. Decision tree models for data mining in hit discovery. Expert Opin. Drug Dis. 7, 341 (2012).
Churpek, M. M. et al. Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit. Care Med. 44, 368 (2016).
Asai, Y., Ooi, H. & Sato, Y. Risk evaluation of carbapenem-induced liver injury based on machine learning analysis. J. Infect. Chemother. 29, 660 (2023).
Strobel, O. et al. Resection after neoadjuvant therapy for locally advanced, “unresectable” pancreatic cancer. Surgery 152, S33 (2012).
Bjornsson, E. S. et al. A new framework for advancing in drug-induced liver injury research. The prospective European DILI registry. Liver Int. 43, 115 (2023).
Chalasani, N. P., Maddur, H., Russo, M. W., Wong, R. J. & Reddy, K. R. ACG Clinical guideline: Diagnosis and management of idiosyncratic drug-induced liver injury. Am. J. Gastroenterol. 116, 878 (2021).
Han, Z. et al. Delta-radiomics models based on multi-phase contrast-enhanced magnetic resonance imaging can preoperatively predict glypican-3-positive hepatocellular carcinoma. Front Physiol. 14, 1138239 (2023).
Zhu, Y. & Wang, M. C. Obtaining optimal cutoff values for tree classifiers using multiple biomarkers. Biometrics 78, 128 (2022).
Shi, K. Q. et al. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees. J Viral Hepat. 24, 132 (2017).
Rutkowski, L., Jaworski, M., Pietruczuk, L. & Duda, P. A new method for data stream mining based on the misclassification error. IEEE T Neur. Net. Lear. 26, 1048 (2015).
Rau, C. S. et al. Identification of pancreatic injury in patients with elevated amylase or lipase level using a decision tree classifier: A cross-sectional retrospective analysis in a level I trauma center. Int. J. Environ. Res. Pub. Health 15(2), 277 (2018).
Davidson, S. M. et al. Direct evidence for cancer-cell-autonomous extracellular protein catabolism in pancreatic tumors. Nat. Med. 23, 235 (2017).
Katayama, K. Zinc and protein metabolism in chronic liver diseases. Nutr. Res. 74, 1 (2020).
Sun, L. et al. Impaired albumin function: a novel potential indicator for liver function damage?. Ann. Med. 51, 333 (2019).
Shi, H., Li, J. & Fu, D. Process of hepatic metastasis from pancreatic cancer: Biology with clinical significance. J. Cancer Res. Clin. 142, 1137 (2016).
Sookoian, S. & Pirola, C. J. Liver enzymes, metabolomics and genome-wide association studies: From systems biology to the personalized medicine. World J. Gastroentero. 21, 711 (2015).
Park, H. N. et al. Upper normal threshold of serum alanine aminotransferase in identifying individuals at risk for chronic liver disease. Liver Int. 32, 937 (2012).
M’Kada, H. et al. Real time identification of drug-induced liver injury (DILI) through daily screening of ALT results: A prospective pilot cohort study. PLOS ONE 7, e42418 (2012).
Weber, S., Allgeier, J., Denk, G. & Gerbes, A. L. Marked increase of gamma-glutamyltransferase as an indicator of drug-induced liver injury in patients without conventional diagnostic criteria of acute liver injury. Visc. Med. 38, 223 (2022).
Chambers, P. et al. Personalising monitoring for chemotherapy patients through predicting deterioration in renal and hepatic function. Cancer Med. 12(17), 17856–17865 (2023).
Vallée, R. et al. Machine learning decision tree models for multiclass classification of common malignant brain tumors using perfusion and spectroscopy MRI data. Front. Oncol. 13, 1089998 (2023).
Sun, S. et al. Development and validation of machine-learning models for the difficulty of retroperitoneal laparoscopic adrenalectomy based on radiomics. Front. Endocrinol. 14, 1265790 (2023).
Funding
The study was supported by the Education Department of Hangzhou City, Zhejiang Province Program (No. Y202249209) .
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. The first draft of the manuscript was written by [Zhongyu Yuan] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yuan, Z., Peng, J., Shu, Z. et al. Interpretable multitemporal liver function indicator model for prediction and risk factor analysis of drug induced liver injury. Sci Rep 14, 21285 (2024). https://doi.org/10.1038/s41598-024-66952-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-66952-8