Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma

Zhang, Kaijiong; Ye, Bo; Wu, Lichun; Ni, Sujiao; Li, Yang; Wang, Qifeng; Zhang, Peng; Wang, Dongsheng

doi:10.1038/s41598-023-40780-8

Download PDF

Article
Open access
Published: 19 August 2023

Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma

Kaijiong Zhang¹^na1,
Bo Ye¹^na1,
Lichun Wu¹,
Sujiao Ni¹,
Yang Li³,
Qifeng Wang²,
Peng Zhang³ &
…
Dongsheng Wang¹

Scientific Reports volume 13, Article number: 13532 (2023) Cite this article

1901 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

The current prognostic tools for esophageal squamous cell carcinoma (ESCC) lack the necessary accuracy to facilitate individualized patient management strategies. To address this issue, this study was conducted to develop a machine learning (ML) prediction model for ESCC patients' survival management. Six ML approaches, including Rpart, Elastic Net, GBM, Random Forest, GLMboost, and the machine learning-extended CoxPH method, were employed to develop risk prediction models. The model was trained on a dataset of 1954 ESCC patients with 27 clinical features and validated on a dataset of 487 ESCC patients. The discriminative performance of the models was assessed using the concordance index (C-index). The best performing model was used for risk stratification and clinical evaluation. The study found that N stage, T stage, surgical margin, tumor grade, tumor length, sex, MPV, AST, FIB, and Mg are the important feature for ESCC patients’ survival. The machine learning-extended CoxPH model, Elastic Net, and Random Forest had similar performance in predicting the mortality risk of ESCC patients, and outperformed GBM, GLMboost, and Rpart. The risk scores derived from the CoxPH model effectively stratified ESCC patients into low-, intermediate-, and high-risk groups with distinctly different 3-year overall survival (OS) probabilities of 80.8%, 58.2%, and 29.5%, respectively. This risk stratification was also observed in the validation cohort. Furthermore, the risk model demonstrated greater discriminative ability and net benefit than the AJCC8th stage, suggesting its potential as a prognostic tool for predicting survival events and guiding clinical decision-making. The classical algorithm of the CoxPH method was also found to be sufficiently good for interpretive studies.

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Segment anything in medical images

Article Open access 22 January 2024

A multi-cancer early detection blood test using machine learning detects early-stage cancers lacking USPSTF-recommended screening

Article Open access 17 April 2024

Introduction

Esophageal cancer (EC) is one of the most lethal malignancies worldwide with an extremely aggressive nature and low survival rate. According to global cancer statistics, there were an estimated 572,000 new cases and 509,000 deaths in 2018¹. In China, esophageal squamous cell carcinoma (ESCC) is the predominant histological type, accounting for approximately 90% of cases. ESCC is characterized by rapid progression and poor prognosis^2,3, with a 5-year survival rate of only 15.3% in advanced stages⁴. Despite advances in surgical techniques and the incorporation of multimodal therapies in recent years, the prognosis of ESCC remains unsatisfactory⁵. Certain biomarkers for the prediction of ESCC prognosis could play a fundamental role in the clinical management of each patient and have important implications regarding the choice of optimal medical therapy for secondary prevention^6,7,8,9. However, effective tools for clinical daily work are currently lacking. Therefore, there is an urgent need to identify novel prognostic biomarkers or develop an integrated prediction model for clinical prediction.

Clinical prediction models that integrate clinicopathological parameters, laboratory indexes, and survival outcomes using big data from large cohorts of patients have the potential to guide clinical decision-making and therapeutic prognoses^10,11,12. Despite significant efforts to explore the prognosis of ESCC, current prognostic models remain imperfect^13,14,15,16. Previous studies have mainly focused on the prognostic evaluation of a small number of clinical indicators using univariate and multivariate analysis^14,15,16,17. Furthermore, most ESCC prediction models have been developed using traditional statistical approaches such as CoxPH regression or logistic regression, without proper evaluation mechanisms to determine the best performing model prior to model building^{13,14,15,16,17}. Additionally, the sample sizes and assessed predictors in these studies are often limited, leading to poor reproducibility of model performance and insufficient evidence for clinical applications^14,15,16,17. Therefore, there is a need to develop more comprehensive and reproducible prediction models for ESCC that can be effectively used in clinical practice.

The emergence of machine learning has presented a potential solution to the issue of poor reproducibility in the development of clinical prediction models based on complex clinical information¹⁸. Machine learning is an interdisciplinary field that combines computer science and computational statistics to improve the efficiency of disease prognosis and therapeutic decision-making. Machine learning approaches can overcome some of the limitations of current analytical methods by utilizing computer algorithms to handle multi-dimensional variables, identify non-linear relationships between clinicopathological features and outcomes, and develop accurate prediction models more efficiently^11,19. Machine learning-based algorithms have been widely applied in medical science, particularly in predicting cancer diagnosis and prognosis¹⁸. For example, Abuhelwa et al.¹⁰ developed a machine learning model for survival prediction in urothelial cancer (UC) patients treated with atezolizumab, which found that the GBM model outperformed other models such as CoxBoost, random forest, and GLM in predicting patients' survival. D'Ascenzo et al.¹¹ also developed a PRAISE score based on four machine learning models for the prediction of 1-year post-discharge all-cause death, myocardial infarction, and major bleeding.

Developing an accurate prediction model is crucial for guiding clinical decision-making, and the key to achieving this is to identify the best-performing algorithm. To date, no studies have employed machine learning algorithms with laboratory indicators to predict prognosis in ESCC patients. Therefore, this study aims to develop a prognostic model using six different machine learning approaches, which could potentially be used to facilitate individualized patient management strategies.

Methods

Study cohort

The objective of this study was to investigate consecutive cases of newly diagnosed ESCC patients who underwent esophagus surgery at Sichuan Cancer Hospital between January 2009 and December 2017. The inclusion criteria were as follows: (1) post-histologically confirmed ESCC without distant metastasis, (2) non-cervical esophageal cancer, (3) without previous anticancer therapy, and (4) complete clinical, blood parameters, and follow-up data. The exclusion criteria were as follows: (1) with a history of other malignancies or perioperative mortality, (2) the neck was invaded with cancer, (3) follow-up information was incomplete, and (4) follow-up shorter than 6 months.

A total of 2441 ESCC patients were enrolled in the study and randomly divided into two datasets. The training cohort (80%) was utilized for model development and parameter tuning, while the testing cohort (20%) was employed for model validation. All patients included in the study were staged according to the American Joint Committee on Cancer (AJCC) 8th edition TNM classification system.

Predictors and outcomes

Among eligible cases, 27 predictors included patient clinicopathological characteristics, laboratory indicators, and survival outcomes that were prospectively collected from medical records. (1) clinicopathological characteristics: age, sex, Karnofsky performance scale (KPS) score, tumor length, tumor grade, tumor location, vascular invasion, surgical margin, dissected lymph nodes (LN) number, nerve invasion, T stage, N stage, AJCC8th stage, treatment. The primary treatment options include surgical intervention alone, followed by adjuvant chemotherapy (CT), radiotherapy (RT), and concurrent chemoradiotherapy (CCRT) after surgery. The surgical methods for esophageal cancer include endoscope (thoracoscopy or laparoscopy) surgery, and thoracotomy surgery. The synchronous chemotherapy regimens for esophageal cancer typically include monotherapy with platinum-based agents, monotherapy with fluorouracil, combination of paclitaxel with platinum agents, combination of cisplatin with fluorouracil or capecitabine, combination of paclitaxel with fluorouracil or capecitabine, and combination of oxaliplatin with fluorouracil or capecitabine. (2) laboratory indicators: hematocrit (HCT), mean platelet volume (MPV), neutrophil-to-lymphocyte ratio (NLR), monocytes (MONO), eosinophils (EO), direct bilirubin (DBIL), albumin (ALB), aspartate aminotransferase (AST), alkaline phosphatase (ALP), sodium (Na), magnesium (Mg), fibrinogen (FIB), lymphocyte -to- monocytes ratio (LMR). The predicted outcome was overall survival (OS), which was defined as the time from the date of surgery to death or the last follow-up. The model’s predictive ability was assessed in 1, 3, and 5- years.

Feature selection and importance

The LASSO regularization and univariable Cox regression analysis was used to perform variable filtering. The LASSO regularization could penalize the absolute values of some coefficients toward zero, so it will remove the less important features from the model. This method has proven to be useful for feature selection in problems with a large number of covariates. Variables with p-values less than 0.05 in univariable Cox analyses were used for subsequent model development. The ranked importance of each feature was calculated by using permutation importance, and the optimal features were extracted after tuning the model parameters with 10-fold cross-validation resampling using the sequential backward search method from the final model. If permuting the values of a feature reduces the discriminative power of the model, it is considered important because the model relies heavily on that feature to make predictions. The high-ranked features will be considered more relevant and the ones with low rank could be excluded.

Model development and validation

Six machine learning algorithms including Recursive Partitioning and Regression Trees (Rpart), Elastic Net Regularized Generalized Linear Models (Elastic Net), Gradient boosted machine (GBM), random survival forest (randomForestSRC), Gradient Boosting with Component wise Linear Models (GLMboost), and machine learning techniques extended Cox proportional hazards (CoxPH) were utilized to fit models that predicted survival outcomes. Rpart is a classification, regression and survival trees algorithm based on recursive partition, which it generates a tree structure by recursive binary partition of the data set, and each leaf node represents a category or a numerical value. In the process of constructing the decision tree, Rpart considers several partition variables and points, as well as pruning, so that the generated model has better generalization ability and prediction ability²⁰. Elastic-net regularization is a flexible solution between Ridge and Lasso, as it combines both L1 and L2 penalties under a parameter called alpha. This method provides the strength of both types of regularization, since the lasso optimizes feature selection and interpretability while Ridge allows grouping effect²¹. GBM is a decision tree based ensemble learning algorithm, which improves the prediction ability of the model by iteratively training a series of decision trees. GBM performs well on many machines learning tasks, including classification, regression, and survival²². Random Survival Forests is a machine learning algorithm used for survival analysis. It is an extension of the Random Forest algorithm and is used to predict the survival time of an individual based on a set of predictor variables. It has several advantages over other survival analysis methods. They are able to handle high-dimensional data and can capture complex non-linear relationships between the predictor variables and the survival outcome. They are also able to handle missing data and censoring, which is common in survival analysis²³. GLMBoost is a gradient boosting tree based regression and classification algorithm that uses the generalized linear model (GLM) as the base model. GLMBoost uses a gradient boosting algorithm to progressively improve the predictive power of the underlying model while controlling the model complexity through regularization. One advantage of GLMBoost is that it can handle a wide range of data types, including categorical and continuous variables. It also has the ability to handle missing data, which is a common problem in real-world datasets²⁴. Cox proportional hazards regression (CoxPH) is a method used in survival analysis to estimate the effect of a factor on survival time. The CoxPH model assumes that the proportional hazard is constant, i.e. the effect of a factor is constant over the entire observation period. CoxPH model can be used to analyze the incidence of illness, death, unemployment and other events.

Hyperparameter tuning for each model was conducted by using grid search with 5-fold cross-validation in the mlr3tuning package. The search space of hyperparameter was created by the paradox package. Each hyperparameter range was established and exhaustively adjusted to enhance the predictive performance of the models and ensure that they fit the data well. The specific hyperparameters for each model is shown Table S1. For the specific meaning of each parameter, please refer to the rpart, gbm, glmnet, randomForestSRC and glmboost packages. The model performance was evaluated by the learning metrics of the average concordance index(C-index) on the training set using grid search with 5-fold cross-validation repeated 20 times, and the best-performing model was selected for further study. The mlr3²⁵ package was employed for model development and model implementation of machine learning.

The risk score of the final model was calculated to stratify patients into three risk groups (low, intermediate, and high) with thresholds reflecting clinically meaningful gradients in risks. Survival probabilities were assessed by using Kaplan-Meier curves with the R “survminer” package in different patient groups. The time receiver operating characteristic (ROC) curve, area under ROC curves (AUC) value, calibration curve, and decision curve analyses (DCA) were employed to access clinical use.

Statistical analysis

The patient’s characteristics were described as number (%) for categorical variables and median (interquartile range [IQR]) or mean ± standard deviation (SD) for continuous variables, respectively. Categorical variables were compared using the Chi-square test or Fisher’s exact test when appropriate. The t-test was performed between parametric continuous variables, while the Mann-Whitney test or Kruskal-Wallis test was performed for non-parametric variables. All statistical analyses were performed using R software 4.1.3 (https://www.r-project.org/), and a two-sided p-value <0.05 was considered to indicate statistical significance.

Ethical approval and consent to participate

This study was approved by the ethics committee of Sichuan Cancer Hospital (Grant No. SCCHEC-02-2020-015) and was conducted in accordance with the Guidelines for Good Clinical Practice and the Declaration of Helsinki. The informed consent requirement was waived by the ethics committee of Sichuan Cancer Hospital due to the retrospective design of the study.

Results

Clinicopathological characteristics

2441 ESCC patients were enrolled according to inclusion and exclusion criteria. 1954 patients were assigned to the training cohort and 487 patients were assigned to the validation cohort (Table 1). The median age of included patients was 62.0 years old (range, 34–90 years), and most patients were males (81.6%). The median follow-up time of OS was 28.23 months (range,6.10–115.3 months).

Table 1 Baseline features of included cohorts in different data sets.

Full size table

Model development of machine learning

To prevent overfitting or uncertainty in the model, we first examined the correlation between continuous variables by spearman method before developing the model. We observed a slight collinearity problem between variables, as shown in Figure S1. We then utilized LASSO regression to penalize and select the optimal features, removing less important features from the model and reducing the correlation between variables. Ultimately, 22 variables were selected for model building with an optimal lambda.min of 0.00805, as shown in Fig. 1. Subsequent univariate COX regression analysis identified 14 significant factors for predicting patients' overall survival, including sex, KPS score, tumor length, tumor grade, surgical margin, vascular invasion, nerve invasion, T stage, N stage, MPV, AST, Na, Mg, and FIB (Table S2). Therefore, these 14 variables were selected for subsequent model development.

Six different survival analysis algorithms were utilized to model development in the training set. The hyperparametric search space and tuning results were given in Table S1. The discriminative performance of the developed models was evaluated by the average C-index using grid search with fivefold cross-validation repeated 20 times. The results were presented in Fig. 2 and Table 2, which demonstrate that the machine learning-extended CoxPH model, Elastic Net, and Random Forest exhibit similar performance in model cross-validation, with a C-index of 0.731. Furthermore, their prediction performance is superior to that of GBM, GLMboost, and Rpart. Considering the importance of model interpretability, we ultimately selected the classical algorithm of CoxPH regression as our final method for further study.

Table 2 Prediction performance of the machine learning methods.

Full size table

Next, we utilized permutation importance method to calculate the ranked importance of 14 variables that were selected from the univariate Cox regression analysis, and the results are presented in Fig. 3. N stage, T stage, surgical margin, MPV, and AST were identified as the top 5 important predictors for predicting survival events. The optimal model features were extracted after tuning the model parameters with tenfold cross-validation resampling using the sequential backward search method. The final 10 features selected for CoxPH model building were N stage, T stage, surgical margin, MPV, AST, tumor grade, sex, FIB, tumor length, and Mg.

To estimate the impact of each predictor on mortality risk in the CoxPH model, we display the marginal effects of each factor in Figure S2. Our results demonstrate that T stages and N stages are significant risk factors in the CoxPH model, with the risk of mortality increasing with higher T and N stages. Females exhibit a lower risk of mortality than males. Positive surgical margins and poorly tumor grade increase the risk of mortality. Additionally, lower levels of MPV and Mg and higher levels of tumor length, AST, and FIB are associated with a greater risk of mortality in the model.

Machine learning model performance

With 10 prognostic features, patients were stratified into estimated risk deciles. We observed similar survival distributions for three risk scores and stratified the deciles of event probability into low, intermediate, and high-risk groups based on the related risks. The first to fourth deciles were classified as low-risk subgroups, with the percentage of observed death being significantly less than 25%. The eighth to tenth deciles were classified as high-risk subgroups, with the percentage of observed death exceeding 50%. The remaining groups were stratified into intermediate-risk groups (fifth to seventh deciles) (Fig. 4A,B).

Kaplan–Meier curve plots of survival probabilities revealed significant differences in survival rates among the high-, intermediate-, and low-risk subgroups in both the training and validation cohorts (Fig. 4C,D, all p < 0.0001). The risk stratification predicted 3-year overall survival probabilities of 80.8%, 58.2%, and 29.5% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 75.4%, 48.8%, and 26.9% in the validation cohort. In addition, the risk stratification predicted 5-year overall survival probabilities of 70.6%, 45.6%, and 18.7% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 65.3%, 27.9%, and 11.0% in the validation cohort (Table 3). The AUC values for 1-, 3-, and 5-year overall survival were 0.760, 0.735, and 0.746 in the training cohort, respectively, and a similar discriminative performance was observed in the validation cohort with AUC values of 0.725, 0.720, and 0.752 for 1-, 3-, and 5-year overall survival, respectively (Fig. 4E,F).

Table 3 3,5-year OS survival probability of CoxPH model-based risk stratification in training and validation cohorts.

Full size table

We further evaluated the performance of the risk model by selecting the top 5 most important features (N stage, T stage, surgical margin, MPV, AST) from the permutation importance results for model development. Our findings demonstrate that the CoxPH risk model exhibits a significant advantage over the combination of these top 5 features, as well as individual features such as N stage (0.681), T stage (0.642), surgical margin (0.535), MPV (0.576), and AST (0.519) (Fig. 5).

Machine learning model evaluation

The machine learning-extended CoxPH risk model exhibits excellent predictive performance for survival events. However, it remains unclear whether the model can be utilized in clinical practice. Therefore, we compared the c-index values between the risk model and the AJCC8th stage using fivefold cross-validation with 200 repeats. Additionally, we employed calibration plots and DCA curves to evaluate the clinical utility of the model. Our results demonstrate that the risk model exhibits superior discriminative ability and net benefit over the AJCC8th stage for all patients in both the training and validation cohorts (Fig. 6). The calibration curve revealed a good agreement between predictions and actual observations for the probability of 1-, 3-, and 5-year survival (Fig. 7).

The influence of treatment option on the model

In general, treatment options can impact the overall survival rate of patients. To clarify the impact of different treatment modalities on the overall survival of patients with ESCC, we evaluated the overall survival outcomes of different treatment subgroups among surgical intervention alone, CT, RT and CCRT treatment patients. However, we found no significant differences in the overall survival rates among the different treatment subgroups (Figure S3). In addition, we further evaluated the survival outcomes of ESCC patients who received surgical intervention alone, and found that the overall survival rate of ESCC patients who underwent endoscopic treatment was higher than those who underwent thoracotomy surgical resection (Figure S4). Furthermore, we also investigated the impact of chemotherapy on the overall survival of ESCC patients who underwent surgery, and found no significant differences in the overall survival rates among the different chemotherapy subgroups (Figure S5). These results suggest that ESCC patients who underwent endoscopic treatment may be in earlier stages of the tumor or have milder symptoms, while those requiring thoracotomy patients may be in advanced stages of the tumor. The patients who received thoracotomy may benefit from adjuvant radiotherapy or chemotherapy to improve their overall survival outcomes, achieving similar results as surgical intervention alone.

Discussion

Machine learning approaches offer a technological innovation for personalized risk assessment¹¹. In this study, we utilized high-quality clinical and laboratory data from a cohort of 2441 ESCC patients to develop and evaluate prediction models for ESCC patients' survival. Our findings indicate that the machine learning-extended CoxPH model demonstrated the best performance for predicting overall survival in ESCC patients. The risk scores derived from the CoxPH model effectively stratified ESCC patients into three prognostic risk groups with distinct survival events. These clinically meaningful risk scores exhibited excellent discriminative abilities, outperforming TNM AJCC8th stage in predicting patients' mortality risks. Accurately predicting mortality risks in ESCC patients remains an unmet need, and to our knowledge, this is the first study to compare the performance of different machine learning algorithms for developing and validating survival-prediction models in ESCC patients.

The use of machine learning to analyze big data offers significant advantages for assimilating and evaluating complex healthcare data¹², and accurately forecasting cancer patients' survival is crucial for therapeutic decision-making and management^10,26,27. While most machine learning-based models have been applied for cancer diagnosis and risk assessment, their application in survival prediction has been limited²⁸. Furthermore, most machine learning-based survival analyses have been based on gene expression data from databases such as The Cancer Genome Atlas (TCGA)^18,29 or multi-omics data³⁰, with few studies utilizing high-dimensional real-world survival data^31,32, thus limiting their applicability to the current practice. Recent research by Abuhelwa et al.¹⁰ demonstrated the feasibility and effectiveness of machine learning-based approaches for survival prediction in urothelial cancer patients treated with atezolizumab. In this study, we employed six machine learning algorithms to develop a prognosis model for 27 clinical variables in ESCC patients and found that the machine learning-extended CoxPH model, Elastic Net, and Random Forest have similar and excellent performance in predicting ESCC patients' survival and outperformed GBM, GLMboost, and Rpart models. Therefore, machine learning-based approaches for ESCC patients' survival prediction are feasible and effective, and the classical algorithms of the CoxPH method remain sufficiently good for interpretive studies.

Several indicators or scores have been developed to estimate the risk and management of ESCC patients based on research efforts investigating predictors of survival^13,15,16,33. Previous studies have identified various factors associated with poor overall survival, including higher NLR and C-reactive protein-to-albumin ratio (CAR), perineural invasion, pathological stage, incomplete resection, neoadjuvant therapy^33,34. We also confirmed that low preoperative serum sodium¹⁵ and low MPV³⁵ were important risk factors for overall survival in ESCC patients, and the coagulation index which established PLT, MPV, and FIB could stratify patients into three risk groups with the 3-year OS rates for the low-, middle- and high-risk groups were 63.5%, 55.5%, and 43.1%, respectively¹³. In this study, we identified N stage, T stage, surgical margin, MPV, AST, tumor grade, sex, FIB, tumor length, and Mg as the most important features for predicting survival events. Higher T and N stages, positive surgical margins, poorly tumor grade were associated with increased mortality risk, while females have lower risk of mortality than males. Additionally, lower levels of MPV, Mg, and higher levels of tumor length, AST, FIB were also associated with a greater risk of mortality. Monitoring these clinical routine indicators can help predict prognostic risk and assist in clinical management strategies for ESCC patients. However, some previous findings may be biased due to small sample sizes or different methodologies³⁶. Nevertheless, CoxPH risk scores derived from machine learning processes and large contemporary patient cohorts have the potential to overcome the shortcomings of existing predictors.

This study has several limitations that should be acknowledged. Firstly, it is an observational retrospective study, and the population included in the study is primarily concentrated in the Asian population, which could potentially introduce selection bias in model building. Additionally, the endpoint of our study was overall survival, and the prediction value for progression-free survival or disease-free survival remains unknown. Therefore, the efficiency of this model requires further systematic validation on larger cohorts by multicenter studies. In conclusion, we have developed and validated a machine learning risk model that can serve as a prognosis tool for predicting the survival of ESCC patients. Furthermore, the classical algorithms of CoxPH method remain sufficiently good for interpretive studies, and machine learning-based approaches are feasible for enhancing the optimization of disease prognosis and clinical decision-making.

Data availability

The authors declare that all data generated or analyzed for this study are available within the paper and its supplementary information. Additional raw data are available from the corresponding author upon reasonable request.

References

Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA 68, 394–424. https://doi.org/10.3322/caac.21492 (2018).
Zhou, M. et al. Cause-specific mortality for 240 causes in China during 1990–2013: a systematic subnational analysis for the Global Burden of Disease Study 2013. The Lancet 387, 251–272. https://doi.org/10.1016/S0140-6736(15)00551-6 (2016).
Article Google Scholar
Liang, H., Fan, J. H. & Qiao, Y. L. Epidemiology, etiology, and prevention of esophageal squamous cell carcinoma in China. Cancer Biol. Med. 14, 33–41. https://doi.org/10.20892/j.issn.2095-3941.2016.0093 (2017).
Article PubMed PubMed Central Google Scholar
Chitti, B. et al. Temporal changes in esophageal cancer mortality by geographic region: A population-based analysis. Cureus 10, e3596. https://doi.org/10.7759/cureus.3596 (2018).
Article PubMed PubMed Central Google Scholar
Baba, Y. et al. Clinical and prognostic features of patients with esophageal cancer and multiple primary cancers: A retrospective single-institution study. Ann. Surg. 267, 478–483. https://doi.org/10.1097/sla.0000000000002118 (2018).
Article PubMed Google Scholar
Liang, S. et al. A nomogram to predict short-term outcome of radiotherapy or chemoradiotherapy based on pre/post-treatment inflammatory biomarkers and their dynamic changes in esophageal squamous cell carcinoma. Int. Immunopharmacol. 90, 107178. https://doi.org/10.1016/j.intimp.2020.107178 (2021).
Article CAS PubMed Google Scholar
Lian, L. et al. Development and verification of a hypoxia- and immune-associated prognosis signature for esophageal squamous cell carcinoma. J. Gastrointest. Oncol. 13, 462–477. https://doi.org/10.21037/jgo-22-69 (2022).
Article PubMed PubMed Central Google Scholar
Liu, T. et al. Development of a novel serum exosomal MicroRNA nomogram for the preoperative prediction of lymph node metastasis in esophageal squamous cell carcinoma. Front. Oncol. 10, 573501. https://doi.org/10.3389/fonc.2020.573501 (2020).
Article PubMed PubMed Central Google Scholar
Min, B. H. et al. Nomogram for prediction of lymph node metastasis in patients with superficial esophageal squamous cell carcinoma. J. Gastroenterol. Hepatol. 35, 1009–1015. https://doi.org/10.1111/jgh.14915 (2020).
Article PubMed Google Scholar
Abuhelwa, A. Y. et al. Machine learning for prediction of survival outcomes with immune-checkpoint inhibitors in urothelial cancer. Cancers 13, 2001. https://doi.org/10.3390/cancers13092001 (2021).
Article CAS PubMed PubMed Central Google Scholar
D’Ascenzo, F. et al. Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): A modelling study of pooled datasets. Lancet (Lond., Engl.) 397, 199–207. https://doi.org/10.1016/s0140-6736(20)32519-8 (2021).
Article Google Scholar
Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262–e273. https://doi.org/10.1016/s1470-2045(19)30149-4 (2019).
Article PubMed Google Scholar
Wang, Q. et al. Development and validation of a practical prognostic coagulation index for patients with esophageal squamous cell cancer. Ann. Surg. Oncol. 28, 8450–8461. https://doi.org/10.1245/s10434-021-10239-z (2021).
Article PubMed Google Scholar
Song, Q., Wu, J. Z., Wang, S. & Chen, W. H. Elevated preoperative platelet distribution width predicts poor prognosis in Esophageal Squamous Cell Carcinoma. Sci. Rep. 9, 15234. https://doi.org/10.1038/s41598-019-51675-y (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Q. et al. Preoperative serum sodium level as a prognostic and predictive biomarker for adjuvant therapy in esophageal cancer. Front. Oncol. 10, 555714. https://doi.org/10.3389/fonc.2020.555714 (2020).
Article PubMed Google Scholar
Zhang, H. et al. The predictive value of a preoperative systemic immune-inflammation index and prognostic nutritional index in patients with esophageal squamous cell carcinoma. J. Cell. Physiol. 234, 1794–1802. https://doi.org/10.1002/jcp.27052 (2019).
Article ADS CAS PubMed Google Scholar
Li, J. et al. A nutrition and inflammation-related nomogram to predict overall survival in surgically resected Esophageal squamous cell carcinoma (ESCC) patients. Nutr. Cancer 74, 1625–1635. https://doi.org/10.1080/01635581.2021.1957131 (2022).
Article CAS PubMed Google Scholar
Li, M. X. et al. Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma. BMC Cancer 21, 906. https://doi.org/10.1186/s12885-021-08647-1 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schwalbe, N. & Wahl, B. Artificial intelligence and the future of global health. Lancet (London, England) 395, 1579–1586. https://doi.org/10.1016/s0140-6736(20)30226-9 (2020).
Article CAS PubMed Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. J. B. Classification and regression. Trees 40, 358. https://doi.org/10.1201/9781315139470 (1984).
Article MathSciNet MATH Google Scholar
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22. https://doi.org/10.18637/jss.v033.i01 (2010).
Article PubMed PubMed Central Google Scholar
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2 (2002).
Article MathSciNet MATH Google Scholar
Zhou, L., Wang, H. & Xu, Q. Random rotation survival forest for high dimensional censored data. Springerplus 5, 1425. https://doi.org/10.1186/s40064-016-3113-5 (2016).
Article PubMed PubMed Central Google Scholar
Bühlmann, P. & Yu, B. Boosting with the L2 loss. J. Am. Stat. Assoc. 98, 324–339. https://doi.org/10.1198/016214503000125 (2003).
Article MATH Google Scholar
Lang, M. et al. mlr3: A modern object-oriented machine learning framework in R. J. Open Sourc. Softw. https://doi.org/10.21105/joss.01903 (2019).
Article Google Scholar
Ding, D. et al. Machine learning-based prediction of survival prognosis in cervical cancer. BMC Bioinformatics 22, 331. https://doi.org/10.1186/s12859-021-04261-x (2021).
Article CAS PubMed PubMed Central Google Scholar
Howard, F. M., Kochanny, S., Koshy, M., Spiotto, M. & Pearson, A. T. Machine learning-guided adjuvant treatment of head and neck cancer. JAMA Network Open 3, e2025881. https://doi.org/10.1001/jamanetworkopen.2020.25881 (2020).
Article PubMed PubMed Central Google Scholar
Gould, M. K., Huang, B. Z., Tammemagi, M. C., Kinar, Y. & Shiff, R. Machine learning for early lung cancer identification using routine clinical and laboratory data. Am. J. Respir. Crit. Care Med. 204, 445–453. https://doi.org/10.1164/rccm.202007-2791OC (2021).
Article PubMed Google Scholar
Yu, J. et al. Characterization of a five-microRNA signature as a prognostic biomarker for esophageal squamous cell carcinoma. Sci. Rep. 9, 19847. https://doi.org/10.1038/s41598-019-56367-1 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Poirion, O. B., Jing, Z., Chaudhary, K., Huang, S. & Garmire, L. X. DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data. Genome Med. 13, 112. https://doi.org/10.1186/s13073-021-00930-x (2021).
Article PubMed PubMed Central Google Scholar
Li, Z. et al. A novel prognostic scoring system of intrahepatic cholangiocarcinoma with machine learning basing on real-world data. Front. Oncol. 10, 576901. https://doi.org/10.3389/fonc.2020.576901 (2020).
Article PubMed Google Scholar
Spooner, A. et al. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci. Rep. 10, 20410. https://doi.org/10.1038/s41598-020-77220-w (2020).
Article CAS PubMed PubMed Central Google Scholar
Ishibashi, Y., Tsujimoto, H., Yaguchi, Y., Kishi, Y. & Ueno, H. Prognostic significance of systemic inflammatory markers in esophageal cancer: Systematic review and meta-analysis. Ann. Gastroenterol. Surg. 4, 56–63. https://doi.org/10.1002/ags3.12294 (2020).
Article PubMed Google Scholar
Kim, H. E., Park, S. Y., Kim, H., Kim, D. J. & Kim, S. I. Prognostic effect of perineural invasion in surgically treated esophageal squamous cell carcinoma. Thoracic Cancer 12, 1605–1612. https://doi.org/10.1111/1759-7714.13960 (2021).
Article PubMed PubMed Central Google Scholar
Liu, X. et al. Adjuvant chemotherapy for lymph node positive esophageal squamous cell cancer: The prediction role of low mean platelet volume. Front. Oncol. 12, 1067682. https://doi.org/10.3389/fonc.2022.1067682 (2022).
Article PubMed PubMed Central Google Scholar
Ishibashi, Y. et al. Prognostic value of platelet-related measures for overall survival in esophageal squamous cell carcinoma: A systematic review and meta-analysis. Critical Rev. Oncol. Hematol. 164, 103427. https://doi.org/10.1016/j.critrevonc.2021.103427 (2021).
Article Google Scholar

Download references

Funding

This research was funded by the Sichuan Provincial Cadre Health Research Project (Chuan Gan Yan 2022–802); the Sichuan Science and Technology Program, China (2021JDRC0152, 2022YFS0006, 2023YFS0488, 2023YFQ0055); Chengdu Science and Technology Bureau Project (2021-YF05-01792-SN).

Author information

These authors contributed equally: Kaijiong Zhang and Bo Ye.

Authors and Affiliations

Department of Clinical Laboratory, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
Kaijiong Zhang, Bo Ye, Lichun Wu, Sujiao Ni & Dongsheng Wang
Department of Radiation Oncology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Affiliated Cancer Hospital of University of Electronic Science and Technology of China, Chengdu, China
Qifeng Wang
Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Yang Li & Peng Zhang

Authors

Kaijiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ye
View author publications
You can also search for this author in PubMed Google Scholar
Lichun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sujiao Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Qifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.K.J and W.D.S conceived the project and designed the study. Z.K.J, W.D.S, and W.Q.F designed the experiment. Y.B, W.L.C and N.S.J collected the data. Z.K.J wrote the manuscript. Z.K.J, Y.L, P.Z, W.D.S, and W.Q.F discussed and revised the manuscript. All authors have read and approved the final version to be published.

Corresponding authors

Correspondence to Qifeng Wang, Peng Zhang or Dongsheng Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, K., Ye, B., Wu, L. et al. Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma. Sci Rep 13, 13532 (2023). https://doi.org/10.1038/s41598-023-40780-8

Download citation

Received: 19 September 2022
Accepted: 16 August 2023
Published: 19 August 2023
DOI: https://doi.org/10.1038/s41598-023-40780-8

This article is cited by

Creation of a machine learning-based prognostic prediction model for various subtypes of laryngeal cancer
- Wei Wang
- Wenhui Wang
- Chengfu Cai
Scientific Reports (2024)
Genetic and molecular characterization of metabolic pathway-based clusters in esophageal squamous cell carcinoma
- Ze Wang
- Yuan Zhang
- Ming Lu
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Segment anything in medical images

A multi-cancer early detection blood test using machine learning detects early-stage cancers lacking USPSTF-recommended screening

Introduction

Methods

Study cohort

Predictors and outcomes

Feature selection and importance

Model development and validation

Statistical analysis

Ethical approval and consent to participate

Results

Clinicopathological characteristics

Model development of machine learning

Machine learning model performance

Machine learning model evaluation

The influence of treatment option on the model

Discussion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Creation of a machine learning-based prognostic prediction model for various subtypes of laryngeal cancer

Genetic and molecular characterization of metabolic pathway-based clusters in esophageal squamous cell carcinoma

Comments

Search

Quick links