Interpretable prognostic modeling of endometrial cancer

Zagidullin, Bulat; Pasanen, Annukka; Loukovaara, Mikko; Bützow, Ralf; Tang, Jing

doi:10.1038/s41598-022-26134-w

Download PDF

Article
Open access
Published: 13 December 2022

Interpretable prognostic modeling of endometrial cancer

Bulat Zagidullin^1,2,
Annukka Pasanen³,
Mikko Loukovaara⁴,
Ralf Bützow^3,4,5 &
…
Jing Tang^1,6

Scientific Reports volume 12, Article number: 21543 (2022) Cite this article

1278 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Endometrial carcinoma (EC) is one of the most common gynecological cancers in the world. In this work we apply Cox proportional hazards (CPH) and optimal survival tree (OST) algorithms to the retrospective prognostic modeling of disease-specific survival in 842 EC patients. We demonstrate that linear CPH models are preferred for the EC risk assessment based on clinical features alone, while interpretable, non-linear OST models are favored when patient profiles can be supplemented with additional biomarker data. We show how visually interpretable tree models can help generate and explore novel research hypotheses by studying the OST decision path structure, in which L1 cell adhesion molecule expression and estrogen receptor status are correctly indicated as important risk factors in the p53 abnormal EC subgroup. To aid further clinical adoption of advanced machine learning techniques, we stress the importance of quantifying model discrimination and calibration performance in the development of explainable clinical prediction models.

Machine learning survival models trained on clinical data to identify high risk patients with hormone responsive HER2 negative breast cancer

Article Open access 26 May 2023

Bayesian risk prediction model for colorectal cancer mortality through integration of clinicopathologic and genomic data

Article Open access 10 June 2023

An updated PREDICT breast cancer prognostic model including the benefits and harms of radiotherapy

Article Open access 15 January 2024

Introduction

Endometrial carcinoma (EC) is the most common gynecologic malignancy in the OECD member states. In 2020, 417,000 new cases and 97,370 deaths have been attributed to the EC worldwide, which is a 10% increase in incidence and an 8% increase in mortality since 2018. Both metrics vary considerably geographically and across patients’ socioeconomic strata^1,2. In the UK, the expected 5-year survival is 77%, with 85% for stage I disease and 25% for stage IV³. EC treatment options depend on tumor staging and histological findings, which are prone to misdiagnosis⁴. Addition of molecular profiling information to histological features has been shown to improve patient stratification and subsequent selection of adjuvant therapies^5,6,7,8,9. To further improve the EC risk assessment, it is important to develop transparent computational models that utilize both clinical and molecular patient profiles.

Two commonly used statistical methods in the survival analysis of EC patients are the Kaplan–Meier method and the Cox proportional hazards (CPH) regression. The Kaplan–Meier method is used to approximate cumulative survival probability (survival function) from lifetime and censored data¹⁰. It is well-suited to summarize survival functions from full cohorts, and it allows for their visual analysis. The CPH regression is the most popular model for the analysis of survival data when multiple variables are available¹¹. Its utility is limited due to the CPH assumptions, such as the linearity and additivity of predictor variables, as well as the methodological difficulties related to variable selection. Machine learning (ML), such as deep learning and ensemble models, improve on these shortcomings. They have been shown to perform particularly well with high-dimensional datasets, such as -omics readouts, electronic health records, and high content imaging^12,13. Deep learning and ensemble ML models have also been applied to prognostic prediction modeling of patient outcomes in the EC^14,15,16,17. However, these ML models still see limited use in the clinical practice¹⁸. Their poor adoption may be attributed to the black-box nature that complicates model interpretability, a high risk of bias, and the need for larger training datasets to achieve similar performance, as compared to linear Cox regression¹⁹.

Tree-based ML methods have been used to account for non-linear effects and variable interactions in survival analysis²⁰. Tree-based ML methods are interpretable by design as every prediction made by a trained model can be associated with a corresponding decision path, and the hierarchical structure of the model as a whole can be easily visualized. Further, they can take into account factors that may act differently in patient subgroups, unlike linear models that favor global factors with uniform effects across entire patient cohorts²¹. There are several variants of decision trees that can be used to estimate patient risks, such as the CART model proposed by Breiman et al. or the conditional inference tree model by Hothorn et al.^22,23. While decision trees can be ensembled leading to better performance than single trees, like in the random survival forest algorithm by Ishwaran et al., this makes them considerably less interpretable^24,25. In light of recent research advances aimed at improving decision tree algorithms through better splitting and pruning criteria, single decision tree models are a good alternative to the CPH regression in the development of explainable clinical prediction models^26,27.

In this retrospective study we explore a cohort of 842 EC patients with 43 clinicopathological and molecular features collected at the Helsinki University Hospital between 2007 and 2012. We report two interpretable models that predict disease-specific survival: a multivariable CPH regression and a visually interpretable optimal survival tree (OST)²⁷. Both are built on two sets of variables: a clinical set and an extended set, which enriches the former with biomarker data, namely CD171 (L1CAM, L1 cell adhesion molecule expression), estrogen receptor (ER) status, peritoneal washing and tumor size. We use Harrell’s time-independent concordance index (C-index) and time-dependent integrated Brier score (IBS) to compare model performance²⁸. These two measures report related, but distinct performance metrics, as C-index quantifies discrimination, or how well a model separates low-risk from high-risk patients, while IBS also quantifies calibration, which is the extent of an agreement between observed outcomes and model predictions²⁹. In this work we show that to select an optimal EC prognostic model, a discrimination measure should be supplemented with a calibration measure, such as IBS^30,31,32,33. We find that the CPH models trained on the clinical variables have a higher C-index than the OST models, whereas the IBS scores of both model types are comparable. Extending clinical data with biomarker information improves the discrimination and calibration performance in both model types, with a larger improvement and the overall best C-index and IBS scores seen in the OST models. Finally, we suggest that the Cox proportional hazards regression should be used in the EC risk assessment based on clinical data only, while optimal survival trees are preferred when biomarker information is available.

Materials and methods

Study cohort

This retrospective analysis is based on a cohort of 842 patients with unselected EC that underwent surgical treatment between 2007 and 2012 at the Helsinki University Hospital. The follow-up time ranges from 1 to 136 months with a median of 82 months. In total, 591 (70.2%) patients survived until the end of the study, 148 (17.6%) died from the EC, 103 (12.2%) died from other causes. The endpoint of interest is disease-specific survival. Based on tumor molecular profiles derived through The Cancer Genome Atlas project, 604 (71.7%) patients were assigned to one of four ProMisE classes, for the remaining 238 (28.2%) patients the ProMisE categories were not assigned experimentally^5,6. Four categories are: (a) mismatch repair deficient (MMRd), (b) no specific molecular profile (NSMP), (c) p53 abnormal and (d) polymerase-ε hypermutated (POLE). Among 604 patients that have ProMisE classes assigned to them, 74 died due to other causes and 30 belong to the POLE subgroup, where no one died from the EC. Each patient is described with a feature vector consisting of 43 variables, out of which 33 are categorical and 10 are numeric. Please refer to the Supplementary Materials—Extended variable information for a more detailed variable description.

Data preprocessing

All numeric variables, except for age and BMI, are winsorized at the 99% level to limit the effect of extreme values using the quantile function derived via the inverse of an empirical distribution function³⁴. Variables with more than five categories, such as FIGO stage, or those with unbalanced class proportions, such as adjuvant therapy status, are simplified by combining subcategories together.

We impute missing values to prevent the exclusion of observed data³⁵. Missing values are imputed using the multivariate imputation by chained equations method, where numerical and binary variables are predicted with random forest models consisting of 100 decision trees, unordered categorical data with more than two levels are imputed with the polytomous regression, and ordered categorical variables with more than two levels are imputed with the proportional odds model³⁶. Variables are imputed in the order of low to high proportion of missingness. R mice package version 3.14.7 is used to generate 120 imputed datasets, which are subsequently merged by taking mean values for the numeric variables and mode values for categorical variables³⁷. The response variable is kept throughout the imputation³⁸.

Finally, to select variables for the CPH regression models we compare the distributions of numerical and binary categorical variables, stratified by the response. We apply the Pearson correlation coefficient to identify collinear numerical variables, and Goodman and Kruskal’s lambda to identify associated categorical variables. Our primary goal is to optimize the CPH regression performance. Therefore, simplification of categorical data and variable selection in the subsequent steps are iterated several times. We use the analysis of deviance test to compare nested CPH models, while the Akaike information criterion is preferred for the comparison of non-nested models.

The complete experimental pipeline is shown in Fig. 1.

Statistical modeling

We train two types of interpretable models to predict individual survival probabilities for the full patient cohorts, and the subcohorts stratified by the ProMisE classes. We assess model performance using C-index and IBS. We estimate 95% confidence intervals for the performance metrics by 1000 repetitions of the ordinary bootstrap with replacement. We also report the performance of seven additional survival analysis models using the C-index metric in the Supplementary Materials—Additional ML models section. We follow the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative hoping to decrease reporting bias and enable model interoperability³⁹.

Cox proportional hazards model

Survival CPH regression is defined as a product of a non-parametric hazard function λ(t) and the e^Xβ term, where t is time, X is a vector of variables describing a patient, and β is a vector of the model's coefficients. The λ(t) part of the CPH model is identical for all patients at a time t. It is referred to as a hazard function of a standard patient, which is a patient with Xβ = 0. The second term is patient-specific, and it is used to calculate a hazard ratio without knowing the hazard function λ(t), where the hazard ratio is the risk of death in relation to a control⁴⁰. We use the Breslow method, as implemented in R riskRegression package version 2022.03.22, to specify the hazard function, which is required for estimating individual survival probabilities⁴¹. We use Schoenfeld residuals to test for the proportional hazards assumption, as implemented in R survival package version 3.3.1. We estimate CPH model parameters by maximizing the partial log-likelihood.

Optimal survival tree model

We use the optimal survival tree (OST) method to develop interpretable decision tree models for estimating patient survival probabilities. The OST algorithm creates multiple candidate decision trees and optimizes their variable splitting thresholds one variable at a time using coordinate descent⁴². The main idea is to use previously optimized parameters in subsequent splitting criteria updates, ultimately outputting a single decision tree that can be visually examined. The OST loss function compares how close the predicted e^Xβ terms for each patient are to the cumulative survival probabilities, obtained by the Nelson-Aalen estimator²⁷. We prioritize model robustness in the training process by: (a) limiting the tree size, since too deep or too wide trees obfuscate the model interpretability, (b) increasing the number of random restarts to use in the local search algorithm, and (c) controlling the minimum number of points that must be present in every leaf node of the fitted trees. The complexity parameter that determines the tradeoff between the accuracy and model complexity is tuned automatically by assessing the out-of-sample performance. The patient cohorts for the OST model training are split, such that the complete cases are used for model fitting, and the imputed subsets are used for validation. The final validated models are then retrained on the combined (complete case and imputed) patient cohorts. We fit the OST models, as implemented in R iai package version 1.7.0, using the log-likelihood criterion⁴³.

Model performance metrics

C-index reports model discrimination performance, i.e. the model’s ability to predict correct rankings of the survival times. C-index is defined as a ratio of concordant pairs of subjects to the total number of comparable pairs. A pair is concordant when a subject with shorter survival time is estimated to have a higher risk than the one with longer survival time. A pair is comparable if (a) it is possible to determine which subject experienced the event first or (b) a subject with a shorter survival time experienced an event, while the other one is censored and is not lost to follow-up yet. C-index ranges between 0 and 1, where higher values are better.

IBS reports both model discrimination and calibration, i.e. the extent of an agreement between observed outcomes and model predictions⁴⁴. Brier score is defined as a mean squared difference between event indicators and predicted survival probabilities at a time t²⁸. By summing Brier scores over a time interval we obtain the integrated Brier score (IBS), which is then adjusted for patients lost to follow-up using the inverse probability censoring weighting method⁴⁵. We use R pec package version 2022.05.04 to compute IBS at 12, 24, 60, and 136 months based on the predicted individual survival probabilities of patients. IBS ranges between 0 and 1, where lower values are better.

Computational resources

All computations are performed using R 4.2.0 on MacOS 12.5 and Python 3.9.7 on Ubuntu 20.04 LTS.

Institutional review board statement

This study was approved by the Institutional Review Board of the Helsinki University Hospital (journal number 135/13/03/03/2013) and conducted according to the guidelines of the Declaration of Helsinki.

Informed consent

Participant informed consent was waived because this was a retrospective study. The Institutional Review Board of the Helsinki University Hospital called for an approval by the National Supervisory Authority for Welfare and Health, which was granted (journal number 753/06.01.03.01/2016).

Results

The initial cohort consists of 842 patients diagnosed with unselected endometrial carcinoma. Following the missing value imputation, excluding subjects that died due to other causes (n = 103) and those that belong to the POLE group (n = 39), where no one died, the final analysis cohort consists of 700 patients. Among 700 patients in the final cohort, 305 (43.6%) belong to the MMRd subgroup, 308 (44%) belong to the NSMP subgroup and 87 (12.4%) belong to the p53ab subgroup. Majority of the tumors are histopathological grade 1–2 (74%) and FIGO stage I disease (73%). The median follow-up time for censored cases is 92 (interquartile range, 78–122) months. There are 182 subjects who had disease recurrence and 147 that died during the follow-up time. Patient demographics are shown in Table 1.

Table 1 Patient demographics (n = 700). Feature set I consists of 7 features (FSI), and feature set II consists of 11 features (FSII). FIGO stage refers to the International Federation of Gynecology and Obstetrics staging system, ProMisE stands for Proactive Molecular Risk Classifier for Endometrial Cancer, MMRd – mismatch repair deficient, NSMP – no specific molecular profile, p53ab – p53 aberrant, ER – estrogen receptor status, CD171 – L1 cell adhesion molecule expression status (L1CAM).

Full size table

The multivariable CPH models are compared with the OST models in prediction of the disease-specific survival using two feature sets in four patient cohorts. Variable selection for both feature sets is performed to optimize the CPH discrimination performance. Subsequently, the OST models are fit on the selected feature sets. The feature set I (FSI) consists of seven variables: age, FIGO stage, histological subgroup, ProMisE, deep myometrial invasion, lymphovascular invasion, and tumor diameter > 3 cm. The feature set II (FSII) adds four more variables to the FSI, namely tumor diameter > 5 cm, peritoneal washing status, ER status and CD171 expression (L1CAM, postoperative L1 cell-adhesion molecule expression status).

Model discrimination

The C-index scores of the CPH and OST models with the corresponding 95% confidence intervals are shown in Table 2.

Table 2 C-index of the Cox proportional hazards (CPH) models vs optimal survival tree (OST) models using. Two feature sets are: FSI (7 features) and FSII (11 features). Models in bold perform the best in their corresponding cohorts. NSMP refers to no specific molecular profile subtype, MMRd – mismatch repair deficient, p53ab – p53 aberrant. 95% confidence intervals (CI95) are calculated using 1,000 iterations of the ordinary bootstrap with replacement.

Full size table

Model discrimination performance is improved by the inclusion of four additional biomarker variables, as indicated by higher C-index scores in the FSII versus FSI feature sets. The OST models trained on the FSII report the highest overall C-index in all ProMisE subcohorts, but the NSMP. Where the CPH model trained on the FSI has the best C-index of 0.8376, followed by the OST model with the C-index of 0.8368. We note that the CPH models trained on the FSI feature set report on average 2.2% higher C-index than the OST models. This trend is reversed in the FSII, where the OST models report on average 2.5% better C-index than the CPH models. The largest C-index increase in the OST models is 10.8% in the MMRd and 8.7% in the p53ab subcohorts, while in the CPH models it is 1.4% in the p53ab subcohort. Overall, non-linear optimal survival tree models benefit more from the additional biomarker data than the linear Cox proportional hazards models.

Model calibration

We report the IBS scores with the 95% confidence intervals for both the CPH and the OST models at 12, 24, and 60 months, and the overall IBS at 136 months of follow-up in Fig. 2 and Supplementary Table 1.

All models across all cohorts and feature sets show better (i.e. lower) IBS at shorter follow-up times, e.g. the IBS scores at 12 months are up to an order of magnitude lower than at 136 months of follow-up. Both OST and CPH model types generally report better IBS scores when trained on a larger feature set (FSII), as compared with the models trained on FSI. The OST models trained on the FSII report 15.4% better IBS at 1 year, 21.6% better IBS at 2 years, 21.0% better IBS at 5 years and 16.2% better IBS at the complete follow-up, as compared with the FSI-trained OST models. The IBS improvements for the CPH models trained on the FSII are 5.7% at 1 year, 7.5% at 2 years, 3.9% at 5 years and 4.4% at the complete follow-up, as compared with the FSI-trained models. Both model types are on par with each other on the FSI feature set, however, the OST models have better IBS scores than the CPH models on the FSII set. The OST models improve more from the additional biomarker data than the CPH models.

Model interpretation

The hazard ratios with the corresponding 95% confidence intervals of the CPH models trained on a full cohort on two feature sets are in Table 3. It is important to note that the interpretation of the CPH model coefficients should be performed when the proportional hazards (PH) assumption is satisfied. We found evidence that according to the Schoenfeld residual test “non-endometrioid” and “estrogen receptor positive” terms do not satisfy the PH assumption in the CPH models built on the FSI and FSII feature sets. Upon the visual inspection, the violations are minor for both. Further, since both model types pass the global PH test with p values of 0.245 and 0.25, respectively, we deem it appropriate to ignore the PH violations.

Table 3 Hazard ratios (HR) of the Cox proportional hazards model trained on the full cohort using 7 features (FSI) and 11 features (FSII) with Wald 95% confidence intervals and log-rank test p values. FIGO stage refers to the International Federation of Gynecology and Obstetrics staging system, ProMisE stands for Proactive Molecular Risk Classifier for Endometrial Cancer, MMRd – mismatch repair deficient, NSMP – no specific molecular profile, p53ab – p53 aberrant, ER – estrogen receptor status, and CD171 – L1 cell adhesion molecule (L1CAM).

Full size table

Age, more advanced disease stages, larger tumor sizes, deep myometrial invasion, lymphovascular space invasion, positive peritoneal washing, negative ER status and positive CD171 are associated with poor survival^46,47. The MMRd and p53ab classes are identified as more aggressive EC forms than the NSMP class, with the HR of 1.61 and 2 (1.8 and 1.88 in the FSII), respectively. Similarly, histological subgroup G3 is associated with a higher risk of death than the G1-G2 subgroup with the HR of 2.04 on the FSI and 1.66 on the FSII. Interestingly, the non-endometrioid EC subgroup is not robustly associated with a higher risk in either FSI or FSII feature sets, with HR of 1.35, p value 0.24 and HR of 0.95, p value 0.87, respectively. This ambiguity in assessing the survival differences between type I and type II tumors has been previously reported in the literature^48,49.

We next explore how the tree models may supplement conventional linear methods in the interpretation of EC risk factors by studying the OST and CPH model types trained on the p53ab subcohort and the FSII feature set. We focus on the p53ab subgroup (n = 87), as it shows the largest relative improvement in the C-index from the additional biomarker data in the CPH models (0.7636 vs 0.7744) and the second largest in the OST models (0.7246 vs 0.7936). The CPH IBS values improve by 3% in FSII, whereas for the OST model the improvement is 19%. The HR scores with the 95% confidence intervals of the FSII-trained CPH model are in Table 4. The decision tree for the FSII-trained OST model is in Fig. 3.

Table 4 Hazard ratios (HR) of the p53ab subcohort Cox proportional hazards model trained on the FSII with Wald 95% confidence intervals and log-rank test p values. FIGO stage refers to the International Federation of Gynecology and Obstetrics staging system, ProMisE stands for Proactive Molecular Risk Classifier for Endometrial Cancer, MMRd – mismatch repair deficient, NSMP – no specific molecular profile, p53ab – p53 aberrant, ER – estrogen receptor status, and CD171 – L1 cell adhesion molecule expression status (L1CAM).

Full size table

The p53ab CPH model reports a relatively high C-index of 0.7744, but the model coefficients are not always informative and require additional validation. For instance, HR 95% confidence intervals of the “FIGO stage” terms do not have an upper bound and all the coefficients’ p values are above the 0.05 threshold of statistical significance (Table 4). Therefore, it is not advised to use this model as is for the downstream tasks that require model interpretation, such as designing nomograms to estimate event probabilities in a clinical setting. The p53ab OST model reports a C-index of 0.7936, and the decision tree path recapitulates some of the existing clinical knowledge (Fig. 3). The OST model selects the CD171 status (L1CAM, L1 cell adhesion molecule expression) as the most informative variable to stratify the cohort on and marks the estrogen receptor status as an important risk factor in the p53ab group. Significance of the ER and CD171 biomarkers in the non-endometrioid p53 aberrant tumors, which are overrepresented in the p53ab ProMisE subcohort with 49.5% of subjects belonging to the non-endometrioid EC subtype versus 11.7% in the full cohort, has been previously reported^47,50,51. This analysis demonstrates how tree-based ML can supplement and even supersede conventional Cox regression in the EC risk assessment if routinely collected clinical data can be enriched with biomarker information.

Discussion

We have trained the CPH and OST models on the full, MMRd, NSMP and p53ab ProMisE subcohorts using clinical and extended feature sets. Linear CPH and non-linear OST models trained on seven clinical variables report comparable discrimination performance with the C-index of 0.8425 vs 0.8493 in the complete cohort, and 0.8376 vs 0.8368 in the NSMP subcohort. Model calibration scores are also similar with a 5-year IBS of 0.0677 vs 0.0666 in the full cohort and 0.0309 vs 0.0314 in the NSMP subcohort. In contrast, the CPH models have a better discrimination performance than the OST models with the C-index of 0.82 vs 0.7886 in the MMRd subcohort, and 0.7636 vs 0.7246 in the p53ab subcohort. The CPH models are as well-calibrated as the OST models in these subcohorts with the 5-year IBS of 0.0736 vs 0.0752 and 0.1467 vs 0.1508, respectively. Considering comparable calibration and better discrimination performance, we recommend the Cox proportional hazards regression over the optimal survival tree models for prognostic EC modelling using patient clinical data.

By enriching the clinical variables with biomarker information, namely estrogen receptor and L1 cell adhesion molecule expression status indicators, peritoneal washing status and tumor size < 5 cm, we improve the discrimination and calibration performance of the CPH and OST models. The OST models better utilize additional features and are overall the best EC risk assessment models in the complete (C-index of 0.8586, IBS at 5 years of 0.0573), p53ab (C-index of 0.7936, IBS at 5 years of 0.1185) and MMRd subcohorts (C-index of 0.8843, IBS at 5 years of 0.0416). Further, we show how interpretable OST decision trees may offer insights into the molecular mechanisms of the EC, where the conventional CPH analysis falls short. The p53ab OST model trained on the extended feature set prioritize the L1 cell adhesion molecule and estrogen receptor status indicators as important predictors in the non-endometrioid p53 aberrant tumors. While the p53ab CPH model reports infinitely wide 95% confidence intervals for the FIGO stages and no model coefficients have p values below the 0.05 threshold of statistical significance. Therefore, due to overall good discrimination and calibration performance, as well as the model interpretability through the decision path analysis, we recommend the OST method over the CPH regression in the endometrial cancer risk assessment, if patient clinical profiles can be enriched with biomarker data.

There are several limitations in our study. Firstly, better prognostic survival models could be created if we had access to an external validation cohort^52,53. In general, we hope that the research community could share anonymized patient datasets more freely, as open-access initiatives contribute to the development of better prognostic prediction models⁵⁴. Further, in addition to the IBS, we are interested in exploring other model calibration measures, such as the integrated calibration index or standardized mortality ratio⁵⁵. The third limitation stems from the methodological difficulties in the assessment of data imputation methods and their downstream effects. In this work we did not perform any formal tests to identify the missingness type, assuming missing at random for all explanatory covariates⁵⁶. We performed an ad hoc assessment of imputation quality by comparing imputed variable distributions with those in the complete case cohorts. More robust and comprehensive methods for the assessment of data imputation techniques are needed⁵⁷.

Conclusion

We show that the Cox proportional hazards and optimal survival tree models are well-suited for the prognostic survival modeling of endometrial carcinoma. The Cox proportional hazards regression is the method of choice for the EC risk assessment on the clinical feature set, consisting of seven variables. Extending clinical variables with the ER and L1CAM status indicators, tumor diameter > 5 cm and peritoneal washing status, improves the discrimination and calibration performance in both model types. Due to the overall best C-index and IBS scores, as well as visually interpretable structure, we recommend optimal survival tree models if clinical variable set can be supplemented with additional biomarker data. Finally, we stress the importance of reporting model discrimination and calibration metrics to promote further adoption of ML prognostic models into the clinical practice.

Data availability

The code and individual survival probabilities estimated using the OST and CPH models are available on https://github.com/netphar/survival_analysis. The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

References

Gu, B. et al. Variations in incidence and mortality rates of endometrial cancer at the global, regional, and national levels, 1990–2019. Gynecol. Oncol. 161, 573–580 (2021).
Article Google Scholar
Endometrial cancer statistics. WCRF International https://www.wcrf.org/cancer-trends/endometrial-cancer-statistics/ (2022).
Crosbie, E. & Morrison, J. The emerging epidemic of endometrial cancer: Time to take action. Cochrane Database Syst. Rev. ED000095 (2014).
Alexa, M., Hasenburg, A. & Battista, M. J. The TCGA molecular classification of endometrial cancer and its possible impact on adjuvant treatment decisions. Cancers 13, (2021).
Talhouk, A. et al. A clinically applicable molecular-based classification for endometrial cancers. Br. J. Cancer 113, 299–310 (2015).
Article CAS Google Scholar
Stelloo, E. et al. Refining prognosis and identifying targetable pathways for high-risk endometrial cancer; a TransPORTEC initiative. Mod. Pathol. 28, 836–844 (2015).
Article CAS Google Scholar
Colombo, N. et al. ESMO-ESGO-ESTRO consensus conference on endometrial cancer: Diagnosis, treatment and follow-up. Ann. Oncol. 27, 16–41 (2016).
Article CAS Google Scholar
Talhouk, A. et al. Confirmation of ProMisE: A simple, genomics-based clinical classifier for endometrial cancer. Cancer 123, 802–813 (2017).
Article CAS Google Scholar
Concin, N. et al. ESGO/ESTRO/ESP guidelines for the management of patients with endometrial carcinoma. Int. J. Gynecol. Cancer 31, 12–39 (2021).
Article Google Scholar
Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).
Article MathSciNet MATH Google Scholar
Harrell, F. E. & Jr. Cox Proportional Hazards Regression Model. In Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 475–517 (Springer International Publishing, 2015).
Vale-Silva, L. A. & Rohr, K. Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 11, 13505 (2021).
Article ADS CAS Google Scholar
Moncada-Torres, A., van Maaren, M. C., Hendriks, M. P., Siesling, S. & Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 11, 6968 (2021).
Article ADS CAS Google Scholar
Wang, W. et al. Prediction of endometrial carcinoma using the combination of electronic health records and an ensemble machine learning method. Front. Med. 9, 851890 (2022).
Article Google Scholar
Pergialiotis, V. et al. The utility of artificial neural networks and classification and regression trees for the prediction of endometrial cancer in postmenopausal women. Public Health 164, (2018).
Hart, G. R. et al. Population-based screening for endometrial cancer: Human vs machine intelligence. Front. Artif. Intell. Appl. 3, 539879 (2020).
Article Google Scholar
Troisi, J. et al. Development and validation of a serum metabolomic signature for endometrial cancer screening in postmenopausal women. JAMA Netw. Open 3, e2018327 (2020).
Article Google Scholar
Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019).
Article Google Scholar
Dhiman, P. et al. Risk of bias of prognostic models developed using machine learning: A systematic review in oncology. Diagn. Progn. Res. 6, 13 (2022).
Article Google Scholar
Bou-Hamad, I., Larocque, D. & Ben-Ameur, H. A review of survival trees. Stat. Surv. 5, 44–71 (2011).
Article MathSciNet MATH Google Scholar
Banerjee, M., Reynolds, E., Andersson, H. B. & Nallamothu, B. K. Tree-based analysis. Circ. Cardiovasc. Qual. Outcomes 12, e004879 (2019).
Article Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and regression trees (Wadsworth & Brooks, Monterey, CA, 1984).
MATH Google Scholar
Hothorn, T., Hornik, K. & Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Stat. 15, 651–674 (2006).
Article MathSciNet Google Scholar
Vellido, A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput. Appl. 32, 18069–18083 (2020).
Article Google Scholar
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann Appl. Stat 2, 841–860 (2008).
Article MathSciNet MATH Google Scholar
Vasilev, I., Petrovskiy, M. & Mashechkin, I. Survival Analysis Algorithms based on Decision Trees with Weighted Log-rank Criteria. In Proceedings of the 11th International Conference on Pattern Recognition Applications and Methods - ICPRAM 132–140.
Bertsimas, D., Dunn, J., Gibson, E. & Orfanoudaki, A. Optimal Survival Trees. Preprint at https://arxiv.org/abs/2012.04284 (2020).
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
Article CAS Google Scholar
Alba, A. C. et al. Discrimination and calibration of clinical prediction models: Users’ guides to the medical literature. JAMA 318, 1377–1384 (2017).
Article Google Scholar
D’Agostino, R. B. & Nam, B.-H. Evaluation of the performance of survival analysis models: Discrimination and calibration measures. Handb. Stat. 23, 1–25 (2003).
Article MathSciNet Google Scholar
Holmberg, L. & Vickers, A. Evaluation of prediction models for decision-making: Beyond calibration and discrimination. PLoS Med. 10, e1001491 (2013).
Article Google Scholar
Park, S. Y., Park, J. E., Kim, H. & Park, S. H. Review of statistical methods for evaluating the performance of survival or other time-to-event prediction models (from conventional to deep learning approaches). Korean J. Radiol. 22, 1697–1707 (2021).
Article Google Scholar
Andaur Navarro, C. L. et al. Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review. BMC Med. Res. Methodol. 22, 12 (2022).
Article Google Scholar
McLernon, D. J. et al. Assessing performance and clinical usefulness in prediction models with survival outcomes: Practical guidance for Cox proportional hazards models. Preprint at https://www.medrxiv.org/content/https://doi.org/10.1101/2022.03.17.22272411v1 (2022).
Janssen, K. J. M. et al. Missing covariate data in medical research: To impute is better than to ignore. J. Clin. Epidemiol. 63, 721–727 (2010).
Article Google Scholar
Azur, M. J., Stuart, E. A., Frangakis, C. & Leaf, P. J. Multiple imputation by chained equations: What is it and how does it work?. Int. J. Methods Psychiatr. Res. 20, 40–49 (2011).
Article Google Scholar
Ramon-Patino, J. L. et al. Prognosis stratification tools in early-stage endometrial cancer: Could we improve their accuracy? Cancers 14, (2022).
White, I. R. & Royston, P. Imputing missing covariate values for the Cox model. Stat. Med. 28, 1982–1998 (2009).
Article MathSciNet Google Scholar
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 162, 55–63 (2015).
Article Google Scholar
Harrell, F. E. & Jr. Parametric Survival Models. In Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 423–451 (Springer International Publishing, 2015).
Breslow, N. Covariance analysis of censored survival data. Biometrics 30, 89–99 (1974).
Article CAS Google Scholar
Bertsekas, D. P. Coordinate Descent. In Nonlinear Programming, Second Edition 160–162 (Athena Scientific, 1999).
Interpretable AI, L. L. C. Interpretable AI Documentation. https://docs.interpretable.ai/stable/ (2022).
van Geloven, N. et al. Validation of prediction models in the presence of competing risks: a guide through modern methods. BMJ 377, e069249 (2022).
Article Google Scholar
Gerds, T. A. & Schumacher, M. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom. J. 48, 1029–1040 (2006).
Article MathSciNet MATH Google Scholar
Vrede, S. W. et al. Immunohistochemical biomarkers are prognostic relevant in addition to the ESMO-ESGO-ESTRO risk classification in endometrial cancer. Gynecol. Oncol. 161, 787–794 (2021).
Article CAS Google Scholar
Karnezis, A. N. et al. Evaluation of endometrial carcinoma prognostic immunohistochemistry markers in the context of molecular classification. Hip Int. 3, 279–293 (2017).
CAS Google Scholar
Reynaers, E. A. E. M., Ezendam, N. P. M. & Pijnenborg, J. M. A. Comparable outcome between endometrioid and non-endometrioid tumors in patients with early-stage high-grade endometrial cancer. J. Surg. Oncol. 111, 790–794 (2015).
Article CAS Google Scholar
Scharl, S. et al. Comparison of survival outcomes and effects of therapy between subtypes of high-grade endometrial cancer–a population-based study. Acta Oncol. 60, 897–903 (2021).
Article CAS Google Scholar
Zeimet, A. G. et al. L1CAM in early-stage type I endometrial cancer: Results of a large multicenter evaluation. J. Natl. Cancer Inst. 105, 1142–1150 (2013).
Article Google Scholar
Van Gool, I. C. et al. Prognostic significance of L1CAM expression and its association with mutant p53 expression in high-risk endometrial cancer. Mod. Pathol. 29, 174–181 (2016).
Article Google Scholar
Steyerberg, E. W. & Harrell, F. E. Jr. Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016).
Article Google Scholar
Van Calster, B. et al. A calibration hierarchy for risk models was defined: from utopia to empirical data. J. Clin. Epidemiol. 74, 167–176 (2016).
Article Google Scholar
Drysdale, E. SurvSet: An open-source time-to-event dataset repository. Preprint at https://arxiv.org/abs/2203.03094 (2022).
Austin, P. C., Harrell, F. E. Jr. & van Klaveren, D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat. Med. 39, 2714–2742 (2020).
Article MathSciNet Google Scholar
White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30, 377–399 (2011).
Article MathSciNet Google Scholar
Shadbahr, T. et al. Classification of datasets with imputed missing values: Does imputation quality matter? Preprint at https://arxiv.org/abs/2206.08478 (2022).

Download references

Acknowledgements

We acknowledge the computational resources provided by the Finnish IT Center for Science. We acknowledge the University of Helsinki for the open-access fees.

Funding

This study was supported by Cancer Foundation Finland, European Research Council (DrugComb, No. 716063), Helsinki University Hospital research funds (TYH2020302), Otto A. Malm Foundation and University of Helsinki Integrative Life Science Doctoral Programme scholarship.

Author information

Authors and Affiliations

Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland
Bulat Zagidullin & Jing Tang
Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, 00290, Helsinki, Finland
Bulat Zagidullin
Department of Pathology, University of Helsinki and Helsinki University Hospital, 00290, Helsinki, Finland
Annukka Pasanen & Ralf Bützow
Department of Obstetrics and Gynecology, Helsinki University Hospital and University of Helsinki, 00290, Helsinki, Finland
Mikko Loukovaara & Ralf Bützow
Research Program in Applied Tumor Genomics, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland
Ralf Bützow
Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, 00290, Helsinki, Finland
Jing Tang

Authors

Bulat Zagidullin
View author publications
You can also search for this author in PubMed Google Scholar
Annukka Pasanen
View author publications
You can also search for this author in PubMed Google Scholar
Mikko Loukovaara
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Bützow
View author publications
You can also search for this author in PubMed Google Scholar
Jing Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.Z. analyzed the data and wrote the manuscript. B.Z., A.P., M.L. and J.T. edited the manuscript. M.L. collected the clinical data. A.P. and R.B. reviewed patient histology, constructed the T.M.A. used for immunohistochemical stainings and scored all the stainings. J.T. supervised the study. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Bulat Zagidullin or Jing Tang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zagidullin, B., Pasanen, A., Loukovaara, M. et al. Interpretable prognostic modeling of endometrial cancer. Sci Rep 12, 21543 (2022). https://doi.org/10.1038/s41598-022-26134-w

Download citation

Received: 07 September 2022
Accepted: 09 December 2022
Published: 13 December 2022
DOI: https://doi.org/10.1038/s41598-022-26134-w

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Machine learning survival models trained on clinical data to identify high risk patients with hormone responsive HER2 negative breast cancer

Bayesian risk prediction model for colorectal cancer mortality through integration of clinicopathologic and genomic data

An updated PREDICT breast cancer prognostic model including the benefits and harms of radiotherapy

Introduction

Materials and methods

Study cohort

Data preprocessing

Statistical modeling

Cox proportional hazards model

Optimal survival tree model

Model performance metrics

Computational resources

Institutional review board statement

Informed consent

Results

Model discrimination

Model calibration

Model interpretation

Discussion

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links