Abstract
Endometrial carcinoma (EC) is one of the most common gynecological cancers in the world. In this work we apply Cox proportional hazards (CPH) and optimal survival tree (OST) algorithms to the retrospective prognostic modeling of disease-specific survival in 842 EC patients. We demonstrate that linear CPH models are preferred for the EC risk assessment based on clinical features alone, while interpretable, non-linear OST models are favored when patient profiles can be supplemented with additional biomarker data. We show how visually interpretable tree models can help generate and explore novel research hypotheses by studying the OST decision path structure, in which L1 cell adhesion molecule expression and estrogen receptor status are correctly indicated as important risk factors in the p53 abnormal EC subgroup. To aid further clinical adoption of advanced machine learning techniques, we stress the importance of quantifying model discrimination and calibration performance in the development of explainable clinical prediction models.
Similar content being viewed by others
Introduction
Endometrial carcinoma (EC) is the most common gynecologic malignancy in the OECD member states. In 2020, 417,000 new cases and 97,370 deaths have been attributed to the EC worldwide, which is a 10% increase in incidence and an 8% increase in mortality since 2018. Both metrics vary considerably geographically and across patients’ socioeconomic strata1,2. In the UK, the expected 5-year survival is 77%, with 85% for stage I disease and 25% for stage IV3. EC treatment options depend on tumor staging and histological findings, which are prone to misdiagnosis4. Addition of molecular profiling information to histological features has been shown to improve patient stratification and subsequent selection of adjuvant therapies5,6,7,8,9. To further improve the EC risk assessment, it is important to develop transparent computational models that utilize both clinical and molecular patient profiles.
Two commonly used statistical methods in the survival analysis of EC patients are the Kaplan–Meier method and the Cox proportional hazards (CPH) regression. The Kaplan–Meier method is used to approximate cumulative survival probability (survival function) from lifetime and censored data10. It is well-suited to summarize survival functions from full cohorts, and it allows for their visual analysis. The CPH regression is the most popular model for the analysis of survival data when multiple variables are available11. Its utility is limited due to the CPH assumptions, such as the linearity and additivity of predictor variables, as well as the methodological difficulties related to variable selection. Machine learning (ML), such as deep learning and ensemble models, improve on these shortcomings. They have been shown to perform particularly well with high-dimensional datasets, such as -omics readouts, electronic health records, and high content imaging12,13. Deep learning and ensemble ML models have also been applied to prognostic prediction modeling of patient outcomes in the EC14,15,16,17. However, these ML models still see limited use in the clinical practice18. Their poor adoption may be attributed to the black-box nature that complicates model interpretability, a high risk of bias, and the need for larger training datasets to achieve similar performance, as compared to linear Cox regression19.
Tree-based ML methods have been used to account for non-linear effects and variable interactions in survival analysis20. Tree-based ML methods are interpretable by design as every prediction made by a trained model can be associated with a corresponding decision path, and the hierarchical structure of the model as a whole can be easily visualized. Further, they can take into account factors that may act differently in patient subgroups, unlike linear models that favor global factors with uniform effects across entire patient cohorts21. There are several variants of decision trees that can be used to estimate patient risks, such as the CART model proposed by Breiman et al. or the conditional inference tree model by Hothorn et al.22,23. While decision trees can be ensembled leading to better performance than single trees, like in the random survival forest algorithm by Ishwaran et al., this makes them considerably less interpretable24,25. In light of recent research advances aimed at improving decision tree algorithms through better splitting and pruning criteria, single decision tree models are a good alternative to the CPH regression in the development of explainable clinical prediction models26,27.
In this retrospective study we explore a cohort of 842 EC patients with 43 clinicopathological and molecular features collected at the Helsinki University Hospital between 2007 and 2012. We report two interpretable models that predict disease-specific survival: a multivariable CPH regression and a visually interpretable optimal survival tree (OST)27. Both are built on two sets of variables: a clinical set and an extended set, which enriches the former with biomarker data, namely CD171 (L1CAM, L1 cell adhesion molecule expression), estrogen receptor (ER) status, peritoneal washing and tumor size. We use Harrell’s time-independent concordance index (C-index) and time-dependent integrated Brier score (IBS) to compare model performance28. These two measures report related, but distinct performance metrics, as C-index quantifies discrimination, or how well a model separates low-risk from high-risk patients, while IBS also quantifies calibration, which is the extent of an agreement between observed outcomes and model predictions29. In this work we show that to select an optimal EC prognostic model, a discrimination measure should be supplemented with a calibration measure, such as IBS30,31,32,33. We find that the CPH models trained on the clinical variables have a higher C-index than the OST models, whereas the IBS scores of both model types are comparable. Extending clinical data with biomarker information improves the discrimination and calibration performance in both model types, with a larger improvement and the overall best C-index and IBS scores seen in the OST models. Finally, we suggest that the Cox proportional hazards regression should be used in the EC risk assessment based on clinical data only, while optimal survival trees are preferred when biomarker information is available.
Materials and methods
Study cohort
This retrospective analysis is based on a cohort of 842 patients with unselected EC that underwent surgical treatment between 2007 and 2012 at the Helsinki University Hospital. The follow-up time ranges from 1 to 136 months with a median of 82 months. In total, 591 (70.2%) patients survived until the end of the study, 148 (17.6%) died from the EC, 103 (12.2%) died from other causes. The endpoint of interest is disease-specific survival. Based on tumor molecular profiles derived through The Cancer Genome Atlas project, 604 (71.7%) patients were assigned to one of four ProMisE classes, for the remaining 238 (28.2%) patients the ProMisE categories were not assigned experimentally5,6. Four categories are: (a) mismatch repair deficient (MMRd), (b) no specific molecular profile (NSMP), (c) p53 abnormal and (d) polymerase-ε hypermutated (POLE). Among 604 patients that have ProMisE classes assigned to them, 74 died due to other causes and 30 belong to the POLE subgroup, where no one died from the EC. Each patient is described with a feature vector consisting of 43 variables, out of which 33 are categorical and 10 are numeric. Please refer to the Supplementary Materials—Extended variable information for a more detailed variable description.
Data preprocessing
All numeric variables, except for age and BMI, are winsorized at the 99% level to limit the effect of extreme values using the quantile function derived via the inverse of an empirical distribution function34. Variables with more than five categories, such as FIGO stage, or those with unbalanced class proportions, such as adjuvant therapy status, are simplified by combining subcategories together.
We impute missing values to prevent the exclusion of observed data35. Missing values are imputed using the multivariate imputation by chained equations method, where numerical and binary variables are predicted with random forest models consisting of 100 decision trees, unordered categorical data with more than two levels are imputed with the polytomous regression, and ordered categorical variables with more than two levels are imputed with the proportional odds model36. Variables are imputed in the order of low to high proportion of missingness. R mice package version 3.14.7 is used to generate 120 imputed datasets, which are subsequently merged by taking mean values for the numeric variables and mode values for categorical variables37. The response variable is kept throughout the imputation38.
Finally, to select variables for the CPH regression models we compare the distributions of numerical and binary categorical variables, stratified by the response. We apply the Pearson correlation coefficient to identify collinear numerical variables, and Goodman and Kruskal’s lambda to identify associated categorical variables. Our primary goal is to optimize the CPH regression performance. Therefore, simplification of categorical data and variable selection in the subsequent steps are iterated several times. We use the analysis of deviance test to compare nested CPH models, while the Akaike information criterion is preferred for the comparison of non-nested models.
The complete experimental pipeline is shown in Fig. 1.
Statistical modeling
We train two types of interpretable models to predict individual survival probabilities for the full patient cohorts, and the subcohorts stratified by the ProMisE classes. We assess model performance using C-index and IBS. We estimate 95% confidence intervals for the performance metrics by 1000 repetitions of the ordinary bootstrap with replacement. We also report the performance of seven additional survival analysis models using the C-index metric in the Supplementary Materials—Additional ML models section. We follow the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative hoping to decrease reporting bias and enable model interoperability39.
Cox proportional hazards model
Survival CPH regression is defined as a product of a non-parametric hazard function λ(t) and the eXβ term, where t is time, X is a vector of variables describing a patient, and β is a vector of the model's coefficients. The λ(t) part of the CPH model is identical for all patients at a time t. It is referred to as a hazard function of a standard patient, which is a patient with Xβ = 0. The second term is patient-specific, and it is used to calculate a hazard ratio without knowing the hazard function λ(t), where the hazard ratio is the risk of death in relation to a control40. We use the Breslow method, as implemented in R riskRegression package version 2022.03.22, to specify the hazard function, which is required for estimating individual survival probabilities41. We use Schoenfeld residuals to test for the proportional hazards assumption, as implemented in R survival package version 3.3.1. We estimate CPH model parameters by maximizing the partial log-likelihood.
Optimal survival tree model
We use the optimal survival tree (OST) method to develop interpretable decision tree models for estimating patient survival probabilities. The OST algorithm creates multiple candidate decision trees and optimizes their variable splitting thresholds one variable at a time using coordinate descent42. The main idea is to use previously optimized parameters in subsequent splitting criteria updates, ultimately outputting a single decision tree that can be visually examined. The OST loss function compares how close the predicted eXβ terms for each patient are to the cumulative survival probabilities, obtained by the Nelson-Aalen estimator27. We prioritize model robustness in the training process by: (a) limiting the tree size, since too deep or too wide trees obfuscate the model interpretability, (b) increasing the number of random restarts to use in the local search algorithm, and (c) controlling the minimum number of points that must be present in every leaf node of the fitted trees. The complexity parameter that determines the tradeoff between the accuracy and model complexity is tuned automatically by assessing the out-of-sample performance. The patient cohorts for the OST model training are split, such that the complete cases are used for model fitting, and the imputed subsets are used for validation. The final validated models are then retrained on the combined (complete case and imputed) patient cohorts. We fit the OST models, as implemented in R iai package version 1.7.0, using the log-likelihood criterion43.
Model performance metrics
C-index reports model discrimination performance, i.e. the model’s ability to predict correct rankings of the survival times. C-index is defined as a ratio of concordant pairs of subjects to the total number of comparable pairs. A pair is concordant when a subject with shorter survival time is estimated to have a higher risk than the one with longer survival time. A pair is comparable if (a) it is possible to determine which subject experienced the event first or (b) a subject with a shorter survival time experienced an event, while the other one is censored and is not lost to follow-up yet. C-index ranges between 0 and 1, where higher values are better.
IBS reports both model discrimination and calibration, i.e. the extent of an agreement between observed outcomes and model predictions44. Brier score is defined as a mean squared difference between event indicators and predicted survival probabilities at a time t28. By summing Brier scores over a time interval we obtain the integrated Brier score (IBS), which is then adjusted for patients lost to follow-up using the inverse probability censoring weighting method45. We use R pec package version 2022.05.04 to compute IBS at 12, 24, 60, and 136 months based on the predicted individual survival probabilities of patients. IBS ranges between 0 and 1, where lower values are better.
Computational resources
All computations are performed using R 4.2.0 on MacOS 12.5 and Python 3.9.7 on Ubuntu 20.04 LTS.
Institutional review board statement
This study was approved by the Institutional Review Board of the Helsinki University Hospital (journal number 135/13/03/03/2013) and conducted according to the guidelines of the Declaration of Helsinki.
Informed consent
Participant informed consent was waived because this was a retrospective study. The Institutional Review Board of the Helsinki University Hospital called for an approval by the National Supervisory Authority for Welfare and Health, which was granted (journal number 753/06.01.03.01/2016).
Results
The initial cohort consists of 842 patients diagnosed with unselected endometrial carcinoma. Following the missing value imputation, excluding subjects that died due to other causes (n = 103) and those that belong to the POLE group (n = 39), where no one died, the final analysis cohort consists of 700 patients. Among 700 patients in the final cohort, 305 (43.6%) belong to the MMRd subgroup, 308 (44%) belong to the NSMP subgroup and 87 (12.4%) belong to the p53ab subgroup. Majority of the tumors are histopathological grade 1–2 (74%) and FIGO stage I disease (73%). The median follow-up time for censored cases is 92 (interquartile range, 78–122) months. There are 182 subjects who had disease recurrence and 147 that died during the follow-up time. Patient demographics are shown in Table 1.
The multivariable CPH models are compared with the OST models in prediction of the disease-specific survival using two feature sets in four patient cohorts. Variable selection for both feature sets is performed to optimize the CPH discrimination performance. Subsequently, the OST models are fit on the selected feature sets. The feature set I (FSI) consists of seven variables: age, FIGO stage, histological subgroup, ProMisE, deep myometrial invasion, lymphovascular invasion, and tumor diameter > 3 cm. The feature set II (FSII) adds four more variables to the FSI, namely tumor diameter > 5 cm, peritoneal washing status, ER status and CD171 expression (L1CAM, postoperative L1 cell-adhesion molecule expression status).
Model discrimination
The C-index scores of the CPH and OST models with the corresponding 95% confidence intervals are shown in Table 2.
Model discrimination performance is improved by the inclusion of four additional biomarker variables, as indicated by higher C-index scores in the FSII versus FSI feature sets. The OST models trained on the FSII report the highest overall C-index in all ProMisE subcohorts, but the NSMP. Where the CPH model trained on the FSI has the best C-index of 0.8376, followed by the OST model with the C-index of 0.8368. We note that the CPH models trained on the FSI feature set report on average 2.2% higher C-index than the OST models. This trend is reversed in the FSII, where the OST models report on average 2.5% better C-index than the CPH models. The largest C-index increase in the OST models is 10.8% in the MMRd and 8.7% in the p53ab subcohorts, while in the CPH models it is 1.4% in the p53ab subcohort. Overall, non-linear optimal survival tree models benefit more from the additional biomarker data than the linear Cox proportional hazards models.
Model calibration
We report the IBS scores with the 95% confidence intervals for both the CPH and the OST models at 12, 24, and 60 months, and the overall IBS at 136 months of follow-up in Fig. 2 and Supplementary Table 1.
All models across all cohorts and feature sets show better (i.e. lower) IBS at shorter follow-up times, e.g. the IBS scores at 12 months are up to an order of magnitude lower than at 136 months of follow-up. Both OST and CPH model types generally report better IBS scores when trained on a larger feature set (FSII), as compared with the models trained on FSI. The OST models trained on the FSII report 15.4% better IBS at 1 year, 21.6% better IBS at 2 years, 21.0% better IBS at 5 years and 16.2% better IBS at the complete follow-up, as compared with the FSI-trained OST models. The IBS improvements for the CPH models trained on the FSII are 5.7% at 1 year, 7.5% at 2 years, 3.9% at 5 years and 4.4% at the complete follow-up, as compared with the FSI-trained models. Both model types are on par with each other on the FSI feature set, however, the OST models have better IBS scores than the CPH models on the FSII set. The OST models improve more from the additional biomarker data than the CPH models.
Model interpretation
The hazard ratios with the corresponding 95% confidence intervals of the CPH models trained on a full cohort on two feature sets are in Table 3. It is important to note that the interpretation of the CPH model coefficients should be performed when the proportional hazards (PH) assumption is satisfied. We found evidence that according to the Schoenfeld residual test “non-endometrioid” and “estrogen receptor positive” terms do not satisfy the PH assumption in the CPH models built on the FSI and FSII feature sets. Upon the visual inspection, the violations are minor for both. Further, since both model types pass the global PH test with p values of 0.245 and 0.25, respectively, we deem it appropriate to ignore the PH violations.
Age, more advanced disease stages, larger tumor sizes, deep myometrial invasion, lymphovascular space invasion, positive peritoneal washing, negative ER status and positive CD171 are associated with poor survival46,47. The MMRd and p53ab classes are identified as more aggressive EC forms than the NSMP class, with the HR of 1.61 and 2 (1.8 and 1.88 in the FSII), respectively. Similarly, histological subgroup G3 is associated with a higher risk of death than the G1-G2 subgroup with the HR of 2.04 on the FSI and 1.66 on the FSII. Interestingly, the non-endometrioid EC subgroup is not robustly associated with a higher risk in either FSI or FSII feature sets, with HR of 1.35, p value 0.24 and HR of 0.95, p value 0.87, respectively. This ambiguity in assessing the survival differences between type I and type II tumors has been previously reported in the literature48,49.
We next explore how the tree models may supplement conventional linear methods in the interpretation of EC risk factors by studying the OST and CPH model types trained on the p53ab subcohort and the FSII feature set. We focus on the p53ab subgroup (n = 87), as it shows the largest relative improvement in the C-index from the additional biomarker data in the CPH models (0.7636 vs 0.7744) and the second largest in the OST models (0.7246 vs 0.7936). The CPH IBS values improve by 3% in FSII, whereas for the OST model the improvement is 19%. The HR scores with the 95% confidence intervals of the FSII-trained CPH model are in Table 4. The decision tree for the FSII-trained OST model is in Fig. 3.
The p53ab CPH model reports a relatively high C-index of 0.7744, but the model coefficients are not always informative and require additional validation. For instance, HR 95% confidence intervals of the “FIGO stage” terms do not have an upper bound and all the coefficients’ p values are above the 0.05 threshold of statistical significance (Table 4). Therefore, it is not advised to use this model as is for the downstream tasks that require model interpretation, such as designing nomograms to estimate event probabilities in a clinical setting. The p53ab OST model reports a C-index of 0.7936, and the decision tree path recapitulates some of the existing clinical knowledge (Fig. 3). The OST model selects the CD171 status (L1CAM, L1 cell adhesion molecule expression) as the most informative variable to stratify the cohort on and marks the estrogen receptor status as an important risk factor in the p53ab group. Significance of the ER and CD171 biomarkers in the non-endometrioid p53 aberrant tumors, which are overrepresented in the p53ab ProMisE subcohort with 49.5% of subjects belonging to the non-endometrioid EC subtype versus 11.7% in the full cohort, has been previously reported47,50,51. This analysis demonstrates how tree-based ML can supplement and even supersede conventional Cox regression in the EC risk assessment if routinely collected clinical data can be enriched with biomarker information.
Discussion
We have trained the CPH and OST models on the full, MMRd, NSMP and p53ab ProMisE subcohorts using clinical and extended feature sets. Linear CPH and non-linear OST models trained on seven clinical variables report comparable discrimination performance with the C-index of 0.8425 vs 0.8493 in the complete cohort, and 0.8376 vs 0.8368 in the NSMP subcohort. Model calibration scores are also similar with a 5-year IBS of 0.0677 vs 0.0666 in the full cohort and 0.0309 vs 0.0314 in the NSMP subcohort. In contrast, the CPH models have a better discrimination performance than the OST models with the C-index of 0.82 vs 0.7886 in the MMRd subcohort, and 0.7636 vs 0.7246 in the p53ab subcohort. The CPH models are as well-calibrated as the OST models in these subcohorts with the 5-year IBS of 0.0736 vs 0.0752 and 0.1467 vs 0.1508, respectively. Considering comparable calibration and better discrimination performance, we recommend the Cox proportional hazards regression over the optimal survival tree models for prognostic EC modelling using patient clinical data.
By enriching the clinical variables with biomarker information, namely estrogen receptor and L1 cell adhesion molecule expression status indicators, peritoneal washing status and tumor size < 5 cm, we improve the discrimination and calibration performance of the CPH and OST models. The OST models better utilize additional features and are overall the best EC risk assessment models in the complete (C-index of 0.8586, IBS at 5 years of 0.0573), p53ab (C-index of 0.7936, IBS at 5 years of 0.1185) and MMRd subcohorts (C-index of 0.8843, IBS at 5 years of 0.0416). Further, we show how interpretable OST decision trees may offer insights into the molecular mechanisms of the EC, where the conventional CPH analysis falls short. The p53ab OST model trained on the extended feature set prioritize the L1 cell adhesion molecule and estrogen receptor status indicators as important predictors in the non-endometrioid p53 aberrant tumors. While the p53ab CPH model reports infinitely wide 95% confidence intervals for the FIGO stages and no model coefficients have p values below the 0.05 threshold of statistical significance. Therefore, due to overall good discrimination and calibration performance, as well as the model interpretability through the decision path analysis, we recommend the OST method over the CPH regression in the endometrial cancer risk assessment, if patient clinical profiles can be enriched with biomarker data.
There are several limitations in our study. Firstly, better prognostic survival models could be created if we had access to an external validation cohort52,53. In general, we hope that the research community could share anonymized patient datasets more freely, as open-access initiatives contribute to the development of better prognostic prediction models54. Further, in addition to the IBS, we are interested in exploring other model calibration measures, such as the integrated calibration index or standardized mortality ratio55. The third limitation stems from the methodological difficulties in the assessment of data imputation methods and their downstream effects. In this work we did not perform any formal tests to identify the missingness type, assuming missing at random for all explanatory covariates56. We performed an ad hoc assessment of imputation quality by comparing imputed variable distributions with those in the complete case cohorts. More robust and comprehensive methods for the assessment of data imputation techniques are needed57.
Conclusion
We show that the Cox proportional hazards and optimal survival tree models are well-suited for the prognostic survival modeling of endometrial carcinoma. The Cox proportional hazards regression is the method of choice for the EC risk assessment on the clinical feature set, consisting of seven variables. Extending clinical variables with the ER and L1CAM status indicators, tumor diameter > 5 cm and peritoneal washing status, improves the discrimination and calibration performance in both model types. Due to the overall best C-index and IBS scores, as well as visually interpretable structure, we recommend optimal survival tree models if clinical variable set can be supplemented with additional biomarker data. Finally, we stress the importance of reporting model discrimination and calibration metrics to promote further adoption of ML prognostic models into the clinical practice.
Data availability
The code and individual survival probabilities estimated using the OST and CPH models are available on https://github.com/netphar/survival_analysis. The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
References
Gu, B. et al. Variations in incidence and mortality rates of endometrial cancer at the global, regional, and national levels, 1990–2019. Gynecol. Oncol. 161, 573–580 (2021).
Endometrial cancer statistics. WCRF International https://www.wcrf.org/cancer-trends/endometrial-cancer-statistics/ (2022).
Crosbie, E. & Morrison, J. The emerging epidemic of endometrial cancer: Time to take action. Cochrane Database Syst. Rev. ED000095 (2014).
Alexa, M., Hasenburg, A. & Battista, M. J. The TCGA molecular classification of endometrial cancer and its possible impact on adjuvant treatment decisions. Cancers 13, (2021).
Talhouk, A. et al. A clinically applicable molecular-based classification for endometrial cancers. Br. J. Cancer 113, 299–310 (2015).
Stelloo, E. et al. Refining prognosis and identifying targetable pathways for high-risk endometrial cancer; a TransPORTEC initiative. Mod. Pathol. 28, 836–844 (2015).
Colombo, N. et al. ESMO-ESGO-ESTRO consensus conference on endometrial cancer: Diagnosis, treatment and follow-up. Ann. Oncol. 27, 16–41 (2016).
Talhouk, A. et al. Confirmation of ProMisE: A simple, genomics-based clinical classifier for endometrial cancer. Cancer 123, 802–813 (2017).
Concin, N. et al. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int. J. Gynecol. Cancer 31, 12–39 (2021).
Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).
Harrell, F. E. & Jr. Cox Proportional Hazards Regression Model. In Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 475–517 (Springer International Publishing, 2015).
Vale-Silva, L. A. & Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 11, 13505 (2021).
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021).
Wang, W. et al. Prediction of endometrial carcinoma using the combination of electronic health records and an ensemble machine learning method. Front. Med. 9, 851890 (2022).
Pergialiotis, V. et al. The utility of artificial neural networks and classification and regression trees for the prediction of endometrial cancer in postmenopausal women. Public Health 164, (2018).
Hart, G. R. et al. Population-based screening for endometrial cancer: Human vs machine intelligence. Front. Artif. Intell. Appl. 3, 539879 (2020).
Troisi, J. et al. Development and validation of a serum metabolomic signature for endometrial cancer screening in postmenopausal women. JAMA Netw. Open 3, e2018327 (2020).
Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019).
Dhiman, P. et al. Risk of bias of prognostic models developed using machine learning: A systematic review in oncology. Diagn. Progn. Res. 6, 13 (2022).
Bou-Hamad, I., Larocque, D. & Ben-Ameur, H. A review of survival trees. Stat. Surv. 5, 44–71 (2011).
Banerjee, M., Reynolds, E., Andersson, H. B. & Nallamothu, B. K. Tree-based analysis. Circ. Cardiovasc. Qual. Outcomes 12, e004879 (2019).
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and regression trees (Wadsworth & Brooks, Monterey, CA, 1984).
Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Stat. 15, 651–674 (2006).
Vellido, A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput. Appl. 32, 18069–18083 (2020).
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann Appl. Stat 2, 841–860 (2008).
Vasilev, I., Petrovskiy, M. & Mashechkin, I. Survival Analysis Algorithms based on Decision Trees with Weighted Log-rank Criteria. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM 132–140.
Bertsimas, D., Dunn, J., Gibson, E. & Orfanoudaki, A. Optimal Survival Trees. Preprint at https://arxiv.org/abs/2012.04284 (2020).
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
Alba, A. C. et al. Discrimination and calibration of clinical prediction models: Users’ guides to the medical literature. JAMA 318, 1377–1384 (2017).
D’Agostino, R. B. & Nam, B.-H. Evaluation of the performance of survival analysis models: Discrimination and calibration measures. Handb. Stat. 23, 1–25 (2003).
Holmberg, L. & Vickers, A. Evaluation of prediction models for decision-making: Beyond calibration and discrimination. PLoS Med. 10, e1001491 (2013).
Park, S. Y., Park, J. E., Kim, H. & Park, S. H. Review of statistical methods for evaluating the performance of survival or other time-to-event prediction models (from conventional to deep learning approaches). Korean J. Radiol. 22, 1697–1707 (2021).
Andaur Navarro, C. L. et al. Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review. BMC Med. Res. Methodol. 22, 12 (2022).
McLernon, D. J. et al. Assessing performance and clinical usefulness in prediction models with survival outcomes: Practical guidance for Cox proportional hazards models. Preprint at https://www.medrxiv.org/content/https://doi.org/10.1101/2022.03.17.22272411v1 (2022).
Janssen, K. J. M. et al. Missing covariate data in medical research: To impute is better than to ignore. J. Clin. Epidemiol. 63, 721–727 (2010).
Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: What is it and how does it work?. Int. J. Methods Psychiatr. Res. 20, 40–49 (2011).
Ramon-Patino, J. L. et al. Prognosis stratification tools in early-stage endometrial cancer: Could we improve their accuracy? Cancers 14, (2022).
White, I. R. & Royston, P. Imputing missing covariate values for the Cox model. Stat. Med. 28, 1982–1998 (2009).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 162, 55–63 (2015).
Harrell, F. E. & Jr. Parametric Survival Models. In Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 423–451 (Springer International Publishing, 2015).
Breslow, N. Covariance analysis of censored survival data. Biometrics 30, 89–99 (1974).
Bertsekas, D. P. Coordinate Descent. In Nonlinear Programming, Second Edition 160–162 (Athena Scientific, 1999).
Interpretable AI, L. L. C. Interpretable AI Documentation. https://docs.interpretable.ai/stable/ (2022).
van Geloven, N. et al. Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ 377, e069249 (2022).
Gerds, T. A. & Schumacher, M. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom. J. 48, 1029–1040 (2006).
Vrede, S. W. et al. Immunohistochemical biomarkers are prognostic relevant in addition to the ESMO-ESGO-ESTRO risk classification in endometrial cancer. Gynecol. Oncol. 161, 787–794 (2021).
Karnezis, A. N. et al. Evaluation of endometrial carcinoma prognostic immunohistochemistry markers in the context of molecular classification. Hip Int. 3, 279–293 (2017).
Reynaers, E. A. E. M., Ezendam, N. P. M. & Pijnenborg, J. M. A. Comparable outcome between endometrioid and non-endometrioid tumors in patients with early-stage high-grade endometrial cancer. J. Surg. Oncol. 111, 790–794 (2015).
Scharl, S. et al. Comparison of survival outcomes and effects of therapy between subtypes of high-grade endometrial cancer–a population-based study. Acta Oncol. 60, 897–903 (2021).
Zeimet, A. G. et al. L1CAM in early-stage type I endometrial cancer: Results of a large multicenter evaluation. J. Natl. Cancer Inst. 105, 1142–1150 (2013).
Van Gool, I. C. et al. Prognostic significance of L1CAM expression and its association with mutant p53 expression in high-risk endometrial cancer. Mod. Pathol. 29, 174–181 (2016).
Steyerberg, E. W. & Harrell, F. E. Jr. Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016).
Van Calster, B. et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J. Clin. Epidemiol. 74, 167–176 (2016).
Drysdale, E. SurvSet: An open-source time-to-event dataset repository. Preprint at https://arxiv.org/abs/2203.03094 (2022).
Austin, P. C., Harrell, F. E. Jr. & van Klaveren, D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat. Med. 39, 2714–2742 (2020).
White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 377–399 (2011).
Shadbahr, T. et al. Classification of datasets with imputed missing values: Does imputation quality matter? Preprint at https://arxiv.org/abs/2206.08478 (2022).
Acknowledgements
We acknowledge the computational resources provided by the Finnish IT Center for Science. We acknowledge the University of Helsinki for the open-access fees.
Funding
This study was supported by Cancer Foundation Finland, European Research Council (DrugComb, No. 716063), Helsinki University Hospital research funds (TYH2020302), Otto A. Malm Foundation and University of Helsinki Integrative Life Science Doctoral Programme scholarship.
Author information
Authors and Affiliations
Contributions
B.Z. analyzed the data and wrote the manuscript. B.Z., A.P., M.L. and J.T. edited the manuscript. M.L. collected the clinical data. A.P. and R.B. reviewed patient histology, constructed the T.M.A. used for immunohistochemical stainings and scored all the stainings. J.T. supervised the study. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zagidullin, B., Pasanen, A., Loukovaara, M. et al. Interpretable prognostic modeling of endometrial cancer. Sci Rep 12, 21543 (2022). https://doi.org/10.1038/s41598-022-26134-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-26134-w
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.