Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Enhancement of prostate cancer diagnosis by machine learning techniques: an algorithm development and validation study



To investigate the value of machine learning(ML) in enhancing prostate cancer(PCa) diagnosis.


Consecutive systematic prostate biopsies performed from Jan 2003–June 2017 were used as the training cohort, and prospective biopsies performed from July 2017-November 2019 were used as validation cohort. Men were included if PSA was 0.4–50 ng/mL, and information of digital rectal examination (DRE), Transrectal ultrasound(TRUS) prostate volume, TRUS abnormality were known. Clinically significant PCa(csPCa) was defined as Gleason 3 + 4 or above cancers. Area-under-curve (AUC) of receiver-operating characteristics (ROC) was compared between PSA, PSA density, European Randomized Study of Screening for Prostate Cancer (ERSPC) risk calculator (ERSPC-RC), and various ML techniques using PSA, DRE and TRUS information. ML techniques used included XGBoost, LightGBM, Catboost, Support vector machine (SVM), Logistic regression (LR), and Random Forest (RF), where cost sensitive learning was applied.


Training and validation cohorts included 3881 and 778 consecutive men, respectively. RF model performed better than other ML techniques and PSA, PSA density and ERSPC-RC for prediction of PCa or csPCa in the validation cohort. In csPCa prediction, AUC of PSA, PSA density, ERSPC-RC and RF was 0.71, 0.80, 0.83 and 0.88 respectively. At 90–95% sensitivity for csPCa, RF model achieved a negative predictive value (NPV) of 97.5–98.0% and avoided 38.3–52.2% unnecessary biopsies. Decision curve analyses (DCA) showed RF model provided net clinical benefit over PSA, PSA density and ERSPC-RC.


By using the same clinical parameters, ML techniques performed better than ERSPC-RC or PSA density in csPCa predictions, and could avoid up to 50% unnecessary biopsies.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Decision curve analyses.


  1. 1.

    Hugosson J, Roobol MJ, Månsson M, Tammela TLJ, Zappa M, Nelen V, et al. A 16-yr follow-up of the European randomized study of screening for prostate cancer. Eur Urol. 2019;76:43–51.

    Article  Google Scholar 

  2. 2.

    Roobol MJ, van Vugt HA, Loeb S, Zhu X, Bul M, Bangma CH, et al. Prediction of prostate cancer risk: the role of prostate volume and digital rectal examination in the ERSPC risk calculators. Eur Urol. 2012;61:577–83.

    Article  Google Scholar 

  3. 3.

    Poyet C, Nieboer D, Bhindi B, Kulkarni GS, Wiederkehr C, Wettstein MS, et al. Prostate cancer risk prediction using the novel versions of the European Randomised Study for Screening of Prostate Cancer (ERSPC) and Prostate Cancer Prevention Trial (PCPT) risk calculators: independent validation and comparison in a contemporary European cohort. BJU Int. 2016;117:401–8.

    Article  Google Scholar 

  4. 4.

    Chiu PK, Roobol MJ, Nieboer D, Teoh JY, Yuen SK, Hou SM, et al. Adaptation and external validation of the European randomised study of screening for prostate cancer risk calculator for the Chinese population. Prostate Cancer Prostatic Dis. 2017;20:99–104.

    CAS  Article  Google Scholar 

  5. 5.

    Pereira-Azevedo N, Osório L, Fraga A, Roobol MJ. Rotterdam prostate cancer risk calculator: development and usability testing of the mobile phone app. JMIR Cancer. 2017;3:e1.

    Article  Google Scholar 

  6. 6.

    Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380:1347–58.

    Article  Google Scholar 

  7. 7.

    Wang G, Teoh JY, Choi KS. Diagnosis of prostate cancer in a Chinese population by using machine learning methods. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:1–4.

    PubMed  Google Scholar 

  8. 8.

    Chiu PK, Teoh JY, Chan SY, Chu PS, Man CW, Hou SM, et al. Role of PSA density in diagnosis of prostate cancer in obese men. Int Urol Nephrol. 2014;46:2251–4.

    CAS  Article  Google Scholar 

  9. 9.

    Chiu PK, Roobol MJ, Teoh JY, Lee WM, Yip SY, Hou SM, et al. Prostate health index (PHI) and prostate-specific antigen (PSA) predictive models for prostate cancer in the Chinese population and the role of digital rectal examination-estimated prostate volume. Int Urol Nephrol. 2016;48:1631–7.

    Article  Google Scholar 

  10. 10.

    Wright RE. Logistic regression. In: Grimm LG, Yarnold PR, editors. Reading and understanding Multivariate Statistics: American Psychological Association; 1995. p. 217–44.

  11. 11.

    Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.

    Google Scholar 

  12. 12.

    Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  13. 13.

    Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21:660–74.

    Article  Google Scholar 

  14. 14.

    Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016.

  15. 15.

    Lewis RJ. An introduction to classification and regression tree (CART) analysis. Annual Meeting of the Society for Academic Emergency Medicine; 2000; San Francisco, California, USA.

  16. 16.

    Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems; 2017.

  17. 17.

    Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:181011363. 2018.

  18. 18.

    Thai-Nghe N, Gantner Z, Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. The 2010 International Joint Conference on Neural Networks (IJCNN); 2010: IEEE.

  19. 19.

    Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak. 2006;26:565–74.

    Article  Google Scholar 

  20. 20.

    Ankerst DP, Boeck A, Freedland SJ, Thompson IM, Cronin AM, Roobol MJ, et al. Evaluating the PCPT risk calculator in ten international biopsy cohorts: results from the Prostate Biopsy Collaborative Group. World J Urol. 2012;30:181–7.

    Article  Google Scholar 

  21. 21.

    Chiu PK, Ng CF, Semjonow A, Zhu Y, Vincendeau S, Houlgatte A, et al. A Multicentre Evaluation of the Role of the Prostate Health Index (PHI) in Regions with Differing Prevalence of Prostate Cancer: Adjustment of PHI Reference Ranges is Needed for European and Asian Settings. Eur Urol. 2019;75:558–61.

    Article  Google Scholar 

  22. 22.

    Rodríguez SVM. Diagnostic accuracy of prostate cancer antigen 3 (PCA3) prior to first prostate biopsy: A systematic review and meta-analysis. Can Urol Assoc J 2020;14:E214–E219.

    PubMed  Google Scholar 

  23. 23.

    Van Neste L, Hendriks RJ, Dijkstra S, Trooskens G, Cornel EB, Jannink SA, et al. Detection of high-grade prostate cancer using a urinary molecular biomarker-based risk score. Eur Urol. 2016;70:740–8.

    Article  Google Scholar 

  24. 24.

    Kasivisvanathan V, Rannikko AS, Borghi M, Panebianco V, Mynderse LA, Vaarala MH, et al. MRI-targeted or standard biopsy for prostate-cancer diagnosis. N. Engl J Med. 2018;378:1767–77.

    Article  Google Scholar 

  25. 25.

    Perera M, Mirchandani R, Papa N, Breemer G, Effeindzourou A, Smith L, et al. PSA-based machine learning model improves prostate cancer risk stratification in a screening population. World J Urol. 2020;39:1897–902.

  26. 26.

    Nitta S, Tsutsumi M, Sakka S, Endo T, Hashimoto K, Hasegawa M, et al. Machine learning methods can more efficiently predict prostate cancer compared with prostate-specific antigen density and prostate-specific antigen velocity. Prostate Int. 2019;7:114–8.

    Article  Google Scholar 

Download references

Author information



Corresponding authors

Correspondence to Kup-Sze Choi or Jeremy Yuen-Chun Teoh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chiu, P.KF., Shen, X., Wang, G. et al. Enhancement of prostate cancer diagnosis by machine learning techniques: an algorithm development and validation study. Prostate Cancer Prostatic Dis (2021).

Download citation


Quick links