To investigate the value of machine learning(ML) in enhancing prostate cancer(PCa) diagnosis.
Consecutive systematic prostate biopsies performed from Jan 2003–June 2017 were used as the training cohort, and prospective biopsies performed from July 2017-November 2019 were used as validation cohort. Men were included if PSA was 0.4–50 ng/mL, and information of digital rectal examination (DRE), Transrectal ultrasound(TRUS) prostate volume, TRUS abnormality were known. Clinically significant PCa(csPCa) was defined as Gleason 3 + 4 or above cancers. Area-under-curve (AUC) of receiver-operating characteristics (ROC) was compared between PSA, PSA density, European Randomized Study of Screening for Prostate Cancer (ERSPC) risk calculator (ERSPC-RC), and various ML techniques using PSA, DRE and TRUS information. ML techniques used included XGBoost, LightGBM, Catboost, Support vector machine (SVM), Logistic regression (LR), and Random Forest (RF), where cost sensitive learning was applied.
Training and validation cohorts included 3881 and 778 consecutive men, respectively. RF model performed better than other ML techniques and PSA, PSA density and ERSPC-RC for prediction of PCa or csPCa in the validation cohort. In csPCa prediction, AUC of PSA, PSA density, ERSPC-RC and RF was 0.71, 0.80, 0.83 and 0.88 respectively. At 90–95% sensitivity for csPCa, RF model achieved a negative predictive value (NPV) of 97.5–98.0% and avoided 38.3–52.2% unnecessary biopsies. Decision curve analyses (DCA) showed RF model provided net clinical benefit over PSA, PSA density and ERSPC-RC.
By using the same clinical parameters, ML techniques performed better than ERSPC-RC or PSA density in csPCa predictions, and could avoid up to 50% unnecessary biopsies.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $24.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Hugosson J, Roobol MJ, Månsson M, Tammela TLJ, Zappa M, Nelen V, et al. A 16-yr follow-up of the European randomized study of screening for prostate cancer. Eur Urol. 2019;76:43–51.
Roobol MJ, van Vugt HA, Loeb S, Zhu X, Bul M, Bangma CH, et al. Prediction of prostate cancer risk: the role of prostate volume and digital rectal examination in the ERSPC risk calculators. Eur Urol. 2012;61:577–83.
Poyet C, Nieboer D, Bhindi B, Kulkarni GS, Wiederkehr C, Wettstein MS, et al. Prostate cancer risk prediction using the novel versions of the European Randomised Study for Screening of Prostate Cancer (ERSPC) and Prostate Cancer Prevention Trial (PCPT) risk calculators: independent validation and comparison in a contemporary European cohort. BJU Int. 2016;117:401–8.
Chiu PK, Roobol MJ, Nieboer D, Teoh JY, Yuen SK, Hou SM, et al. Adaptation and external validation of the European randomised study of screening for prostate cancer risk calculator for the Chinese population. Prostate Cancer Prostatic Dis. 2017;20:99–104.
Pereira-Azevedo N, Osório L, Fraga A, Roobol MJ. Rotterdam prostate cancer risk calculator: development and usability testing of the mobile phone app. JMIR Cancer. 2017;3:e1.
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58.
Wang G, Teoh JY, Choi KS. Diagnosis of prostate cancer in a Chinese population by using machine learning methods. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:1–4.
Chiu PK, Teoh JY, Chan SY, Chu PS, Man CW, Hou SM, et al. Role of PSA density in diagnosis of prostate cancer in obese men. Int Urol Nephrol. 2014;46:2251–4.
Chiu PK, Roobol MJ, Teoh JY, Lee WM, Yip SY, Hou SM, et al. Prostate health index (PHI) and prostate-specific antigen (PSA) predictive models for prostate cancer in the Chinese population and the role of digital rectal examination-estimated prostate volume. Int Urol Nephrol. 2016;48:1631–7.
Wright RE. Logistic regression. In: Grimm LG, Yarnold PR, editors. Reading and understanding Multivariate Statistics: American Psychological Association; 1995. p. 217–44.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21:660–74.
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016.
Lewis RJ. An introduction to classification and regression tree (CART) analysis. Annual Meeting of the Society for Academic Emergency Medicine; 2000; San Francisco, California, USA.
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems; 2017.
Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:181011363. 2018.
Thai-Nghe N, Gantner Z, Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. The 2010 International Joint Conference on Neural Networks (IJCNN); 2010: IEEE.
Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak. 2006;26:565–74.
Ankerst DP, Boeck A, Freedland SJ, Thompson IM, Cronin AM, Roobol MJ, et al. Evaluating the PCPT risk calculator in ten international biopsy cohorts: results from the Prostate Biopsy Collaborative Group. World J Urol. 2012;30:181–7.
Chiu PK, Ng CF, Semjonow A, Zhu Y, Vincendeau S, Houlgatte A, et al. A Multicentre Evaluation of the Role of the Prostate Health Index (PHI) in Regions with Differing Prevalence of Prostate Cancer: Adjustment of PHI Reference Ranges is Needed for European and Asian Settings. Eur Urol. 2019;75:558–61.
Rodríguez SVM. Diagnostic accuracy of prostate cancer antigen 3 (PCA3) prior to first prostate biopsy: A systematic review and meta-analysis. Can Urol Assoc J 2020;14:E214–E219.
Van Neste L, Hendriks RJ, Dijkstra S, Trooskens G, Cornel EB, Jannink SA, et al. Detection of high-grade prostate cancer using a urinary molecular biomarker-based risk score. Eur Urol. 2016;70:740–8.
Kasivisvanathan V, Rannikko AS, Borghi M, Panebianco V, Mynderse LA, Vaarala MH, et al. MRI-targeted or standard biopsy for prostate-cancer diagnosis. N. Engl J Med. 2018;378:1767–77.
Perera M, Mirchandani R, Papa N, Breemer G, Effeindzourou A, Smith L, et al. PSA-based machine learning model improves prostate cancer risk stratification in a screening population. World J Urol. 2020;39:1897–902.
Nitta S, Tsutsumi M, Sakka S, Endo T, Hashimoto K, Hasegawa M, et al. Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity. Prostate Int. 2019;7:114–8.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chiu, P.KF., Shen, X., Wang, G. et al. Enhancement of prostate cancer diagnosis by machine learning techniques: an algorithm development and validation study. Prostate Cancer Prostatic Dis (2021). https://doi.org/10.1038/s41391-021-00429-x