Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database

Kim, Jung Kwon; Lee, Sangchul; Hong, Sung Kyu; Kwak, Cheol; Jeong, Chang Wook; Kang, Seok Ho; Hong, Sung-Hoo; Kim, Yong-June; Chung, Jinsoo; Hwang, Eu Chang; Kwon, Tae Gyun; Byun, Seok-Soo; Jung, Yu Jin; Lim, Junghyun; Kim, Jiyeon; Oh, Hyeju

doi:10.1038/s41598-023-30826-2

Download PDF

Article
Open access
Published: 08 April 2023

Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database

Jung Kwon Kim^1,2,
Sangchul Lee^1,2,
Sung Kyu Hong^1,2,
Cheol Kwak^2,3,
Chang Wook Jeong^2,3,
Seok Ho Kang⁴,
Sung-Hoo Hong⁵,
Yong-June Kim⁶,
Jinsoo Chung⁷,
Eu Chang Hwang⁸,
Tae Gyun Kwon⁹,
Seok-Soo Byun^1,10,
Yu Jin Jung¹⁰,
Junghyun Lim¹¹,
Jiyeon Kim¹¹ &
…
Hyeju Oh¹¹

Scientific Reports volume 13, Article number: 5778 (2023) Cite this article

1356 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

We developed a novel prediction model for recurrence and survival in patients with localized renal cell carcinoma (RCC) after surgery and a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes using a large Asian nationwide dataset, updated KOrean Renal Cell Carcinoma (KORCC) database that covered data for a total of 10,068 patients who had received surgery for RCC. After data pre-processing, feature selection was performed with an elastic net. Nine variables for recurrence and 13 variables for survival were extracted from 206 variables. Synthetic minority oversampling technique (SMOTE) was used for the training data set to solve the imbalance problem. We applied the most of existing ML algorithms introduced so far to evaluate the performance. We also performed subgroup analysis according to the histologic type. Diagnostic performances of all prediction models achieved high accuracy (range, 0.77–0.94) and F1-score (range, 0.77–0.97) in all tested metrics. In an external validation set, high accuracy and F1-score were well maintained in both recurrence and survival. In subgroup analysis of both clear and non-clear cell type RCC group, we also found a good prediction performance.

UroPredict: Machine learning model on real-world data for prediction of kidney cancer recurrence (UroCCR-120)

Article Open access 23 February 2024

A three-feature prediction model for metastasis-free survival after surgery of localized clear cell renal cell carcinoma

Article Open access 21 April 2021

Deep learning based prediction of prognosis in nonmetastatic clear cell renal cell carcinoma

Article Open access 13 January 2021

Introduction

The incidence of renal cell carcinoma (RCC) is increasing worldwide. Approximately 76,000 new cases and almost 14,000 deaths from RCC were reported in the US in 2021¹. In Korea, we also observed the same trend according to the latest cancer incidence statistics from the Korea Central Cancer Registry². Among them, clear cell type RCC represents approximately 70% cases in adults³. Estimated 5-year survival rate of localized RCC patients is approximately 90%. However, in about 30% of either recurrence or metastasis cases, the survival rate is drastically reduced⁴. Thus, it is imperative to predict the high-risk group for recurrence in advance and establish a differentiated surveillance protocol for patients who have undergone a curative surgery.

Over the past decades, several nomograms for recurrence and/or survival of localized RCC have been developed and applied in clinical practice^5,6,7,8. Among them, the Kattan nomogram based on pathological T stage, nuclear grade, tumor size, necrosis, vascular invasion, and clinical presentation was the first introduced and widely used model^5,6. Subsequently, the Leibovich model was developed by Mayo Clinic to estimate the risk of metastasis or recurrence using tumor stage, regional lymph node status, tumor size, nuclear grade and histologic tumor necrosis⁷. The most recently developed model known as the GRANT score was based on patient age, nuclear grade, and pathologic T/N stage⁸. However, these models were developed and validated using a small cohort from a single institution. In addition, they were limited to Western datasets. Moreover, their prediction accuracies were not as high as expected. For most models, their accuracy values were around 0.7^5,6,7,8.

Thus, we tried to develop a novel prediction model for recurrence and survival in patients with localized RCC after surgery using a large Asian nationwide dataset. We also used a novel statistical method of machine learning (ML) to improve accuracy in predicting outcomes.

Materials and methods

Ethics statement

The Institutional Review Board (IRB) of Seoul National University Bundang Hospital approved this study (approval number: B-2106-688-108). The requirement for obtaining written informed consent from patients was waived by the IRB due to the retrospective nature of this study. Personal identifiers were completely deleted to ensure that data were analyzed anonymously. Our study was conducted according to the ethical standards of the 1964 Declaration of Helsinki and its later amendments.

Data sets

The KOrean Renal Cell Carcinoma (KORCC) database was first established in 2011. It had data from eight academic institutions nationwide⁹. Recently, data of each institution were updated from March to June 2021. Subsequently, the updated KORCC database covered data of a total of 10,068 patients who had received surgery for RCC with 206 variables, including demographic, perioperative, pathologic, and survival information.

Model development (n = 4,829) and internal validation (n = 2,070) were performed using data from seven centers except data from Seoul National University Bundang Hospital (SNUBH, n = 3,169). External validation was performed using data from the SNUBH to assess the generality of the model performance. SNUBH was suitable for external validation because of its size and diverse patient population.

All study procedures were performed according to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) recommendations¹⁰.

All institutions obtained IRB approvals before inputting data into the database. Unified data templates were used for consistent data collection at each institution. Survival data were retrospectively reviewed from medical records or identified from death certificate data.

Data processing and feature selection

Data pre-processing mainly included processing missing values to obtain a reliable set of data. The missing value imputation process was divided into three aspects: patients, predictors, and statistics. At first, we eliminated patients with missing basic information. Subsequently, we performed predictive analytics for variables including total protein, Hb, creatinine. For this method, we used Euclidean distance to determine the similarity between two values and replace the missing one with similar one. Other missing values were corrected using k-nearest neighbor (KNN)¹¹. KNN is non-parametric and instance-based method, and useful for datasets having both qualitative and quantitative attribute values.

After pre-processing, we performed feature selection with an elastic net¹². Before implementing elastic net model, we defined four default variables that had been considered as the most significant predictors for recurrence and survival: gender, age at surgery, smoking, and BMI^13,14. Elastic net is known as a hybrid of ridge regression and lasso regularization. Thus, elastic net can generate reduced models by generating zero-valued coefficients. Similar to the lasso, elastic net simultaneously perform automatic variable selection and continuous shrinkage¹⁵. We subsequently performed a feature importance raking method (Supplemental Fig. 1). Finally, we extracted nine variables for recurrence and 13 variables for survival (Fig. 1).

Synthetic minority oversampling technique (SMOTE)

Imbalanced data problem is a situation in which data are biased toward one class in applying ML classification algorithms¹⁶. When modeling using imbalanced data, the ML algorithm attempts to improve the performance by predicting a large number of classes, in which most patients are concentrated, resulting in lower predictability of a small number of classes. Thus, imbalanced data problem should be solved using methods such as oversampling or underdamping. In the current study, we used the SMOTE to the training data set to solve the imbalance problem¹⁷.

Statistical analysis and ML model development

We evaluated performances of the following representative ML classification algorithms: logistic regression¹⁸, kernel support vector machine (SVM)¹⁹, decision tree²⁰, random forest²¹, naïve Bayes (NB)²², Extreme Gradient Boosting algorithm (XGBoost)²³, Natural Gradient Boosting (NGBoost)²⁴, LightGbm²⁵, and CatBoost²⁶. We adopted accuracy and F-1 score to evaluate the prediction performance. The F-1 score is made up of both precision and recall metrics. It is designed to work more accurately on imbalanced data²⁷. We also performed subgroup analysis according to histologic type. Non-clear cell type RCC included eight types: papillary, chromophobe, collecting duct, unclassified, multilocular cystic, mixed, Xp11.2 translocation, and clear cell papillary. All statistical analyses were performed using commercially available software (IBM SPSS Statistics ver. 21.0 and Python ver. 3.7.6).

Ethics statement

The Institutional Review Board (IRB) of Seoul National University Bundang Hospital approved this study (approval number: B-2106–688-108).

Informed consent to patients

The waiver of the informed consent requirement was approved by the local ethics committee of Seoul National University Bundang Hospital considering the retrospective study design involving anonymized data.

Results

Patient characteristics

Distribution of data sets before and after SMOTE for recurrence (n = 6,717) and survival (n = 5,730) is described in Table 1. The ratio of training set to test set was 7:3. Overall survival rates at 3, 5, and 10 years were 94.2%, 90.6%, and 71.9%, respectively; and the recurrence-free rates were 85.2%, 78.8% and 45.3%, respectively.

Table 1 Distribution of data sets before and after synthetic minority oversampling technique application (SMOTE).

Full size table

Subsequently, we compared patient characteristics and distribution of each variable for recurrence and survival (Table 2). In a comparative analysis between recurrence and non-recurrence groups, we found several significantly different variables except for four default variables (gender, age at surgery, smoking, and BMI): Eastern Cooperative Oncology Group (ECOG) performance status, symptoms at diagnosis, transfusion, pathologic T/N stages, sarcomatoid differentiation, necrosis, lymphovascular invasion (LVI), and Fuhrman nuclear grade (all p < 0.05). In terms of survival, ECOG performance status, symptoms at diagnosis, transfusion, pathologic T/N stages, sarcomatoid differentiation, necrosis, LVI, histologic type, Fuhrman nuclear grade, and recurrence were significant variables (all p < 0.05).

Table 2 Baseline characteristics.

Full size table

Prediction model performance and external validation

Diagnostic performance of several machine learning algorithms for the prediction of 3-, 5-, and 10-year recurrence and survival are listed in Table 3. All models achieved very high accuracy (range, 0.77–0.94) and F1-score (range, 0.77–0.97) in all tested metrics. Subsequently, external validation with a SNUBH dataset (n = 3,169) was performed using all models (Fig. 2). High accuracy and F1-score were well maintained in external validation in both recurrence and survival (Supplemental Table 1).

Table 3 Diagnostic performance of machine learning algorithms for the prediction of recurrence and survival.

Full size table

Subgroup analysis

In subgroup analysis according to the histologic type (clear vs. non-clear cell type RCC), dataset distribution before and after SMOTE for recurrence and survival is described in Supplemental Table 2. Consequently, we also found very high accuracy (range, 0.64–0.91) and F1-score (range, 0.72–0.94) in all tested metrics (Supplemental Tables 3 and 4).

Discussion

Using the original KORCC database⁹, two recent studies have been reported^28,29. At first, Byun et al.²⁸ assessed the prognosis of non-metastatic clear cell RCC using a deep learning-based survival predictions model. Harrel’s C-indices of DeepSurv for recurrence and cancer-specific survival were 0.802 and 0.834, respectively. More recently, Kim et al.²⁹ developed ML-based algorithm predicting the probability of recurrence at 5 and 10 years after surgery. The highest area under the receiver operating characteristic curve (AUROC) was obtained from the naïve Bayes (NB) model, with values of 0.836 and 0.784 at 5 and 10 years, respectively.

In the current study, we used the updated KORCC database. It now contains clinical data of more than 10,000 patients. To the best of our knowledge, this is the largest dataset in Asian population with RCC. With this dataset, we could develop much more accurate models with very high accuracy (range, 0.77–0.94) and F1-score (range, 0.77–0.97, Table 3). The accuracy values were relatively high compared to the previous models, including the Kattan nomogram, Leibovich model, the GRANT score, which were around 0.7^5,6,7,8. Among them, the Kattan nomogram was developed using a cohort of 601 patients with clinically localized RCC, and the overall C-index was 74%⁵. In a subsequent analysis with the same patient group using an additional prognostic variables including tumor necrosis, vascular invasion, and tumor grade, the C-index was as high as 82%³⁰. Their prediction accuracies were not as high as ours yet.

In addition, we could include short-term (3-year) recurrence and survival data, which would be helpful for developing more sophisticated surveillance strategy. The other strength of current study was that most algorithms introduced so far had been applied^{18,19,20,21,22,23,24,25,26}, showing relatively consistent performance with high accuracy. Finally, we also performed an external validation by using a separate (SNUBH) cohort, and achieved well maintained high accuracy and F1-score in both recurrence and survival (Fig. 2). External validation of prediction models is essential, especially in case of using the multi-institutional dataset, to ensure and correct for differences between institutions.

AUROC has been mostly used as the standard evaluating performance of prediction models^5,6,7,8,29. However, AUROC weighs changes in sensitivity and specificity equally without considering clinically meaningful information⁶. In addition, the lack of ability to compare performance of different ML models is another limitation of AUROC technique³¹. Thus, we adopted accuracy and F1-score instead of AUROC as evaluation metrics. F1-score, in addition to SMOTE¹⁷, is used as better accuracy metrics to solve the imbalanced data problems²⁷.

RCC is not a single disease, but multiple histologically defined cancers with different genetic characteristics, clinical courses, and therapeutic responses³². With regard to metastatic RCC, the International Metastatic Renal Cell Carcinoma Database Consortium and the Memorial Sloan Kettering Cancer Center risk model have been extensively validated and widely used to predict survival outcomes of patients receiving systemic therapy^33,34. However, both risk models had been developed without considering histologic subtypes. Thus, the predictive performance was presumed to have been strongly affected by clear cell type (predominant histologic subtype) RCC. Interestingly, in our previous study using the Korean metastatic RCC registry, we found the both risk models reliably predicted progression and survival even in non-clear cell type RCC³⁵. In the current study, after performing subgroup analysis according to the histologic type (clear vs. non-clear cell type RCC), we also found very high accuracy and F1-score in all tested metrics (Supplemental Tables 3 and 4). Taking together, these findings suggest that the prognostic difference between clear and non-clear cell type RCC seems to be offset both in metastatic and non-metastatic RCC. Further effort is needed to develop and validate a sophisticated prediction model for individual subtypes of non-clear cell type RCC.

The current study had several limitations. First, due to the paucity of long-term follow-up cases at 10 years, data imbalance problem could not be avoided. Subsequently, recurrence-free rate at 10-year was reported only to be 45.3%. In the majority of patients, further long-term follow up had not been performed in case of no evidence of disease at five years. However, we adopted both SMOTE and F1-score to solve these imbalanced data problems. The retrospective design of this study was also an inherent limitation. Another limitation was that the developed prediction model only included the Korean population. Validation of the model using data from other countries and races is also needed. In regard of non-clear cell type RCC, the current study cohort is still relatively small due to the rarity of the disease, we could not avoid integrating each subtype and analyzing together. Thus, further studies is still needed to develop and validate a prediction model for each subtypes. In addition, the lack of more accurate classifiers such as cross-validation and bootstrapping is another limitation of current study. Finally, the web-embedded deployment of model should be followed to improve accessibility and transportability.

Conclusions

A novel ML algorithm for predicting recurrence and survival in localized RCC patients after surgery was successfully developed and validated using the updated KORCC database. This prediction model is anticipated to offer a differentiated surveillance protocol. It will be a useful tool for patient counseling.

Data availability

All data enquiries can be directed to the corresponding author.

References

Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2021. CA Cancer J. Clin. 71, 7–33. https://doi.org/10.3322/caac.21654 (2021).
Article PubMed Google Scholar
National Cancer Registration Statistics, Ministry of Health and Welfare, Republic of Korea. https://www.cancer.go.kr (2019).
Gansler, T., Fedewa, S., Amin, M. B., Lin, C. C. & Jemal, A. Trends in reporting histological subtyping of renal cell carcinoma: Association with cancer center type. Hum. Pathol. 74, 99–108. https://doi.org/10.1016/j.humpath.2018.01.010 (2018).
Article PubMed Google Scholar
Ferlay, J. et al. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer 144, 1941–1953. https://doi.org/10.1002/ijc.31937 (2019).
Article CAS PubMed Google Scholar
Kattan, M. W., Reuter, V., Motzer, R. J., Katz, J. & Russo, P. A postoperative prognostic nomogram for renal cell carcinoma. J. Urol. 166, 63–67 (2001).
Article CAS PubMed Google Scholar
Sorbellini, M. et al. A postoperative prognostic nomogram predicting recurrence for patients with conventional clear cell renal cell carcinoma. J. Urol. 173, 48–51. https://doi.org/10.1097/01.ju.0000148261.19532.2c (2005).
Article PubMed Google Scholar
Leibovich, B. C. et al. Prediction of progression after radical nephrectomy for patients with clear cell renal cell carcinoma: A stratification tool for prospective clinical trials. Cancer 97, 1663–1671. https://doi.org/10.1002/cncr.11234 (2003).
Article PubMed Google Scholar
Buti, S. et al. Validation of a new prognostic model to easily predict outcome in renal cell carcinoma: The GRANT score applied to the ASSURE trial population. Ann. Oncol. 28, 2747–2753. https://doi.org/10.1093/annonc/mdx492 (2017).
Article CAS PubMed PubMed Central Google Scholar
Byun, S. S. et al. The establishment of KORCC (KOrean Renal Cell Carcinoma) database. Investig. Clin. Urol. 57, 50–57. https://doi.org/10.4111/icu.2016.57.1.50 (2016).
Article PubMed PubMed Central Google Scholar
Collins, G. S., Reitsma, J. B., Altman, D. G., Moons, K. G. M. & members of the, T. g. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement. Eur Urol 67, 1142-1151, doi:https://doi.org/10.1016/j.eururo.2014.11.025 (2015).
Mucherino, A., Papajorgji, P., Pardalos, P. M. & SpringerLink. k-Nearest Neighbor Classification in Data Mining in Agriculture. Vol 34. (Springer, 2009). https://doi.org/10.1007/978-0-387-88615-2_4
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net (vol B 67, pg 301, 2005). J. R. Stat. Soc. B 67, 768–768. https://doi.org/10.1111/j.1467-9868.2005.00527.x (2005).
Article Google Scholar
Scelo, G. & Larose, T. L. Epidemiology and risk factors for kidney cancer. J. Clin. Oncol. https://doi.org/10.1200/JCO.2018.79.1905 (2018).
Article PubMed PubMed Central Google Scholar
Capitanio, U. et al. Epidemiology of renal cell carcinoma. Eur. Urol. 75, 74–84. https://doi.org/10.1016/j.eururo.2018.08.036 (2019).
Article PubMed Google Scholar
Chen, G. et al. ProAcePred: Prokaryote lysine acetylation sites prediction based on elastic net feature optimization. Bioinformatics 34, 3999–4006. https://doi.org/10.1093/bioinformatics/bty444 (2018).
Article CAS PubMed Google Scholar
Li, D. C., Liu, C. W. & Hu, S. C. A learning method for the class imbalance problem with medical data sets. Comput. Biol. Med. 40, 509–518. https://doi.org/10.1016/j.compbiomed.2010.03.005 (2010).
Article PubMed Google Scholar
Alghamdi, M. et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS ONE 12, e0179805. https://doi.org/10.1371/journal.pone.0179805 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liao, J. G. & Chin, K. V. Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23, 1945–1951. https://doi.org/10.1093/bioinformatics/btm287 (2007).
Article CAS PubMed Google Scholar
Huang, M. W., Chen, C. W., Lin, W. C., Ke, S. W. & Tsai, C. F. SVM and SVM ensembles in breast cancer prediction. PLoS ONE 12, e0161501. https://doi.org/10.1371/journal.pone.0161501 (2017).
Article CAS PubMed PubMed Central Google Scholar
Song, Y. Y. & Lu, Y. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 27, 130–135. https://doi.org/10.11919/j.issn.1002-0829.215044 (2015).
Article PubMed PubMed Central Google Scholar
Chan, J. C. W. & Paelinckx, D. Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sens. Environ. 112, 2999–3011. https://doi.org/10.1016/j.rse.2008.02.011 (2008).
Article ADS Google Scholar
Subbalakshmi, G., Ramesh, K. & Chinna, R. M. Decision support in heart disease prediction system using Naive Bayes. Indian J. Comput. Sci. Eng. 2, 170–176 (2011).
Google Scholar
Chen, T. Q. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. Kdd'16: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794, https://doi.org/10.1145/2939672.2939785 (2016).
Duan, T. et al. NGBoost: Natural gradient boosting for probabilistic prediction. Pr. Mach. Learn. Res. 119, 71 (2020).
Google Scholar
Ke, G. L. et al. LightGBM: A highly efficient gradient boosting decision tree. Adv. Neur. In. 30, 89 (2017).
Google Scholar
Dorogush, A.V., Ershov, V., & Gulin, A. CatBoost: gradient boosting with categorical features support. ArXiv, abs/1810.11363 (2018).
Takahashi, K., Yamamoto, K., Kuchiba, A. & Koyama, T. Confidence interval for micro-averaged F-1 and macro-averaged F-1 scores. Appl. Intell. 52, 4961–4972. https://doi.org/10.1007/s10489-021-02635-5 (2022).
Article Google Scholar
Byun, S. S. et al. Deep learning based prediction of prognosis in nonmetastatic clear cell renal cell carcinoma. Sci. Rep. Uk 11, 21. https://doi.org/10.1038/s41598-020-80262-9 (2021).
Article CAS Google Scholar
Kim, H., Lee, S. J., Park, S. J., Choi, I. Y. & Hong, S. H. Machine learning approach to predict the probability of recurrence of renal cell carcinoma after surgery: Prediction model development study. Jmir Med. Inf. 9, 35 (2021).
Google Scholar
Halligan, S., Altman, D. G. & Mallett, S. Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: A discussion and proposal for an alternative approach. Eur. Radiol. 25, 932–939. https://doi.org/10.1007/s00330-014-3487-0 (2015).
Article PubMed PubMed Central Google Scholar
Carrington, A. M. et al. A new concordant partial AUC and partial c statistic for imbalanced data in the evaluation of machine learning algorithms. BMC Med. Inform. Decis. 20, 1014 (2020).
Article Google Scholar
Ricketts, C. J. et al. The cancer genome atlas comprehensive molecular characterization of renal cell carcinoma. Cell Rep. 23, 313. https://doi.org/10.1016/j.celrep.2018.03.075 (2018).
Article CAS PubMed PubMed Central Google Scholar
Heng, D. Y. C. et al. External validation and comparison with other models of the International Metastatic Renal-Cell Carcinoma Database Consortium prognostic model: A population-based study. Lancet Oncol. 14, 141–148. https://doi.org/10.1016/S1470-2045(12)70559-4 (2013).
Article PubMed PubMed Central Google Scholar
Motzer, R. J. et al. Prognostic factors for survival in previously treated patients with metastatic renal cell carcinoma. J. Clin. Oncol. 22, 454–463. https://doi.org/10.1200/Jco.2004.06.132 (2004).
Article PubMed Google Scholar
Kim, J. K. et al. Application of the international metastatic renal cell carcinoma database consortium and memorial sloan kettering cancer center risk models in patients with metastatic non-clear cell renal cell carcinoma: A multi-institutional retrospective study using the korean metastatic renal cell carcinoma registry. Cancer Res. Treat. 51, 758–768. https://doi.org/10.4143/crt.2018.421 (2019).
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Urology, Seoul National University Bundang Hospital, Seongnam, Korea
Jung Kwon Kim, Sangchul Lee, Sung Kyu Hong & Seok-Soo Byun
Department of Urology, Seoul National University College of Medicine, Seoul, Korea
Jung Kwon Kim, Sangchul Lee, Sung Kyu Hong, Cheol Kwak & Chang Wook Jeong
Department of Urology, Seoul National University Hospital, Seoul, Korea
Cheol Kwak & Chang Wook Jeong
Department of Urology, Korea University Anam Hospital, Seoul, Korea
Seok Ho Kang
Department of Urology, Seoul St. Mary’s Hospital, The Catholic University of Korea, Seoul, Korea
Sung-Hoo Hong
Department of Urology, Chungbuk National University Hospital, Cheongju, Korea
Yong-June Kim
Department of Urology, National Cancer Center, Goyang, Korea
Jinsoo Chung
Department of Urology, Chonnam National University Medical School, Gwangju, Korea
Eu Chang Hwang
Department of Urology, Kyungpook National University Chilgok Hospital, Daegu, Korea
Tae Gyun Kwon
Department of Medical Device Development, Seoul National University College of Medicine, Seoul, Korea
Seok-Soo Byun & Yu Jin Jung
The IMC Lnc., Daegu, Korea
Junghyun Lim, Jiyeon Kim & Hyeju Oh

Authors

Jung Kwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sangchul Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sung Kyu Hong
View author publications
You can also search for this author in PubMed Google Scholar
Cheol Kwak
View author publications
You can also search for this author in PubMed Google Scholar
Chang Wook Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Seok Ho Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Hoo Hong
View author publications
You can also search for this author in PubMed Google Scholar
Yong-June Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jinsoo Chung
View author publications
You can also search for this author in PubMed Google Scholar
Eu Chang Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Tae Gyun Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Seok-Soo Byun
View author publications
You can also search for this author in PubMed Google Scholar
Yu Jin Jung
View author publications
You can also search for this author in PubMed Google Scholar
Junghyun Lim
View author publications
You can also search for this author in PubMed Google Scholar
Jiyeon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyeju Oh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conception: J.K.K, S.S.B., H.O. Data collection: S.L., S.K.H., C.K., C.W.J., S.H.K., S.H.H., Y.J.K., J.C., E.C.H., T.G.K. Data analyzation: J.K.K., Y.J.J., J.L., J.K., H.O. Draft manuscript: J.K.K., C.W.J., S.H.K., Y.J.K., J.C., T.G.K. Reviewed manuscript: S.K.H., C.K., S.H.H., E.C.H., S.S.B. Revised manuscript: J.K.K, H.O. All the authors showed consent for publication of this study.

Corresponding author

Correspondence to Seok-Soo Byun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kim, J.K., Lee, S., Hong, S.K. et al. Machine learning based prediction for oncologic outcomes of renal cell carcinoma after surgery using Korean Renal Cell Carcinoma (KORCC) database. Sci Rep 13, 5778 (2023). https://doi.org/10.1038/s41598-023-30826-2

Download citation

Received: 28 June 2022
Accepted: 02 March 2023
Published: 08 April 2023
DOI: https://doi.org/10.1038/s41598-023-30826-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

UroPredict: Machine learning model on real-world data for prediction of kidney cancer recurrence (UroCCR-120)

A three-feature prediction model for metastasis-free survival after surgery of localized clear cell renal cell carcinoma

Deep learning based prediction of prognosis in nonmetastatic clear cell renal cell carcinoma

Introduction

Materials and methods

Ethics statement

Data sets

Data processing and feature selection

Synthetic minority oversampling technique (SMOTE)

Statistical analysis and ML model development

Ethics statement

Informed consent to patients

Results

Patient characteristics

Prediction model performance and external validation

Subgroup analysis

Discussion

Conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links