Introduction

Prostate cancer is most commonly screened for in men by measurement of prostate-specific antigen (PSA) in blood. While systematic PSA screening clearly reduces prostate cancer mortality, the low specificity of PSA for aggressive disease also leads to overdiagnosis and overtreatment1,2,3,4. Elevated PSA or a suspicious digital rectal examination (DRE) will commonly prompt transrectal ultrasound (TRUS)-guided biopsies for histopathological evaluation of the presence of any tumors. TRUS biopsy uses a standardized, systematic sampling procedure using 10–12 needles to sample the prostate. However, the suboptimal specificity of PSA for prostate cancer leads to many unnecessary biopsies as well as to detection of many clinically insignificant prostate cancers, which left untreated would not give rise to any symptoms in the man’s normal lifespan5. Consequently, a more accurate assessment of the need for performing TRUS biopsy could reduce both unneeded biopsies and reduce overdiagnosis of clinically insignificant cancers.

Previously, a prespecified statistical model based on measurements of four-kallikrein markers (free PSA, intact PSA, total PSA, and kallikrein-related peptidase 2 (hK2)) measured in blood and combined with age and DRE findings, commercially available as the 4Kscore has been demonstrated to accurately predict the presence of Grade group 2 or higher prostate cancer in patients undergoing TRUS biopsy6,7,8,9. Similarly, a previous study has shown that the expression levels of 3 microRNAs in extracellular vesicle-enriched cell-free urine, when combined in a ratio model (called uCaP) could distinguish between benign prostatic hyperplasia and prostate cancer with greater accuracy than PSA10.

In this study, we aimed to investigate (i) the values of the 4Kscore in a single institution cohort of Danish men undergoing initial TRUS biopsy, (ii) whether the uCaP score could predict Grade group 2 or higher prostate cancer (GG2 or above) on biopsy, and (iii) whether a combination of the 4Kscore and uCaP could further enhance predictive accuracy.

Materials and methods

We identified 240 men (median age 67; quartiles 61–72) who were referred by their general practitioner to Department of Urology, Aarhus University Hospital, Denmark (2015–2018) for initial TRUS guided systematic 10+ core biopsy based on clinical indications (elevated PSA and/or suspect DRE). All research was performed in accordance with relevant guidelines/regulations and in accordance with the Declaration of Helsinki. All patients provided written informed consent and the study was approved by The Central Denmark Region Committees on Health Research Ethics (reference nr. 1-10-72-367-13) and notified to the Danish Data Protection Agency (reference nr. 1-16-02-248-14). Prior to biopsy, the DRE status for each patient was re-evaluated by an urologist and used in the subsequent models. All cores were histopathologically evaluated as part of clinical routine and the highest GG was reported for each patient.

Cryopreserved EDTA plasma samples were shipped on dry ice to Lund University in Malmö, Sweden for measurements of kallikrein levels conducted in 2018–2019 blind to outcome. Total and free PSA levels were measured using the AutoDelfia 1235 automatic immunoassay system using the dual-label DELFIA Prostatus total/free PSA-Assay (Perkin-Elmer, Turku, Finland) calibrated against the World Health Organization (WHO) 96/670 (PSA-WHO) and WHO 68/668 (free PSA-WHO) standards. Intact PSA and human kallikrein-related peptidase 2 (hK2) were measured with F(ab')2 fragments of the monoclonal capture antibodies to reduce the frequency of nonspecific assay interference, as previously reported11. We excluded six patients without available kallikrein measurements, leaving us with a final cohort of 234 men.

Our outcome was defined as International Society of Urology Pathologists (ISUP) Grade Group 2 or higher (equivalent to Gleason score 3 + 4 or higher) prostate cancer on biopsy. We compared the kallikrein panel (“4Kscore”) to a “base model” which only included total PSA, age, and DRE results. Coefficients for both models were built on the ProtecT cohort6 and were locked down before the data from the current cohort was received, that is, this is an independent validation study of a prespecified model. To evaluate the discriminative accuracy, we calculated the area under the receiver operating curve (AUC) for both the base model and the 4Kscore, and used the Delong, Delong, Clark-Pearson method for inferences on the difference in AUC. To assess the level of agreement between the 4Kscore predictions and the actual risk of Grade group 2 or higher cancer on biopsy we used a calibration plot. Finally, to determine the clinical value of the 4Kscore, we used decision curve analysis to compare the net benefits of this model to the base model, and a biopsy-all and biopsy-none strategy. As the 4Kscore has previously been reported to be useful in men with modestly elevated PSA levels13, our primary analysis was carried out in patients with a total PSA value ≤ 25 ng/ml. As a sensitivity analysis, we carried out all the aforementioned analyses after additionally including patients with a total PSA value > 25 ng/ml.

As a secondary aim, we were interested in ascertaining whether the uCaP score could add to the base model to predict Grade group 2 or higher prostate cancer. The uCaP score was calculated based on expression levels of three microRNAs (miR-200b-3p, miR-27b-3p, and miR-30b-5p) measured in first void morning urine collected by patients at their home in a 50-mL falcon tube containing one Stabilur® tablet prior on the day of TRUS biopsy, as described previously10,14. We excluded 48 patients without available uCaP score due to either (1) the patient was unable to provide a urine sample (n = 15) or (2) one of the three microRNA assays were below detection limit/failed quality control (n = 33). We created a multivariable logistic regression model with Grade group 2 or higher prostate cancer on biopsy as the outcome, the uCaP score—entered as a non-linear term using restricted cubic splines with knots at the tertiles—as the predictor, and total PSA, age, and DRE as covariates which were included in the model as linear predictors based on coefficients from the ProtecT cohort6. As the coefficients for this uCaP model were built on the current dataset, contrasting with the prespecified coefficients for the 4Kscore, which were based on the ProtecT cohort, we utilized repeated tenfold cross validation to evaluate the discriminative accuracy of this model when calculating the AUC. We then used the Delong, Delong, Clark-Pearson method to assess differences in AUC between the uCaP model and the 4Kscore. We additionally included the uCaP model in the decision curve analysis.

Finally, we were interested in determining whether the uCaP score add predictiveness to the 4Kscore. We created a multivariable logistic regression model with Grade group 2 or higher prostate cancer as the outcome, the uCaP score (non-linear) as the predictor and the 4Kscore as a covariate6. We then used Wald’s test to assess whether the change in AUC was significant by testing whether the coefficients for non-linear uCaP score variables are simultaneously equal to zero and reported the AUC for this model utilizing repeated tenfold cross validation. All statistical analyses were conducted using STATA 15.0 (StataCorp, College Station, TX).

Results

We identified 234 patients referred to initial TRUS biopsy with measured total, free and intact PSA as well as hK2 levels in EDTA plasma samples, 29 of whom had a total PSA > 25 ng/ml and were excluded, leaving 205 patients for analysis and whose characteristics are displayed in Table 1. Notably, this cohort is higher risk than is commonly reported for biopsy cohorts6,15, with 34% of patients having a positive DRE prior to biopsy and nearly one fifth having extremely high Gleason (GG 4 or 5).

Table 1 Patient and clinical characteristics (N = 205). All values are median (quartiles) or frequency (proportion).

On primary analysis, we found that the 4Kscore could predict Grade group 2 or higher prostate cancer in biopsies with high accuracy (AUC 0.763; 95% CI: 0.696, 0.829; Supplementary Table S1). However, this difference in discrimination was not statistically significant compared to the base model consisting of total PSA, age, and DRE (AUC 0.733; 95% CI: 0.661, 0.805; difference 0.030; bootstrapped 95% CI: − 0.020, 0.080; p-value = 0.2). Figure 1 depicts the calibration of the 4Kscore, where there appears to be miscalibration, with underestimation of risk at lower probabilities (this is also observed in the base model, Supplementary Fig. S1). However, the Hosmer–Lemeshow goodness-of-fit test p-value was 0.085, failing to reject the null hypothesis of good calibration. The decision curve analysis is shown in Fig. 2. As expected from the miscalibration at lower threshold probabilities, both the base model and the 4Kscore are inferior to the strategy of biopsying all men unless the threshold probability of aggressive disease was relatively high (~ 25% or higher).

Figure 1
figure 1

Calibration showing the predicted versus actual Grade group 2 or higher cancer detection using the 4Kscore (Hosmer–Lemeshow goodness-of-fit p = 0.085).

Figure 2
figure 2

Decision curve analysis comparing the 4Kscore (blue dashed line), base-model (green dashed line), treat-all (orange solid line), and treat-none (red solid line) strategies (N = 205).

A sensitivity analysis including patients with high total PSA (n = 29 with PSA > 25 ng/ml) did not importantly affect our results. With a wider range of PSAs in the samples, discrimination for both models were increased but the difference between models was similar, with AUCs 0.802 and 0.780 for the 4Kscore and the base model, respectively (p-value = 0.2). Calibration and decision analyses were similar (data not shown). Another sensitivity analysis excluding patients with PSA > 10 ng/ml (n = 61) who therefore were likely to be biopsied independent of 4Kscore, yielded further reduced discrimination, with AUCs 0.723 and 0.695 for the 4Kscore and base model, respectively (p-value = 0.5). All results of our sensitivity analyses were consistent with our primary analysis, where we see non-statistically significant higher discrimination in the 4Kscore model compared to the base model, indicating consistent results after excluding patients who are at higher risk and could be argued should undergo TRUS biopsy regardless of their 4Kscore prediction.

Among the subset of patients with available uCaP scores (n = 157), the median score was 6.8 (quartiles 6.0, 7.4). Patient characteristics among this subset are also shown in Table 1. Figure 3 shows the distribution of the uCaP score, as well as the risk of Grade group 2 or higher prostate cancer on biopsy based on uCaP scores, with covariates set at the mean. We found evidence of an association between the uCaP score and Grade group 2 or higher prostate cancer (non-linear association, overall p-value = 0.039) after adjusting for age, DRE, and total PSA. For example, the probability of Grade group 2 or higher prostate cancer is 28% for a patient with an uCaP score of 6.3, and 57% for a patient with the same baseline risk and an uCaP score of 8. The uCaP model had an AUC of 0.759 (95% C.I. 0.680, 0.839), compared to the AUC of 0.758 (95% C.I. 0.682, 0.834) for the 4Kscore (p-value > 0.9) in this subcohort of men with available uCaP scores (Supplementary Table S1). Supplementary Fig. S2 shows the decision curve, where the model with the uCaP score has a net benefit equal to or better than other models previously evaluated. After adjusting for the 4Kscore, we did not find evidence of an association between the uCaP score and Grade group 2 or higher prostate cancer and the AUC for this model was 0.766 (95% CI 0.688, 0.844) (non-linear association test p-value = 0.092). Supplementary Fig. S3 presents the decision curve, only among men with PSA ≤ 10 ng/ml, as patients with PSA > 10 ng/ml would be biopsied independent of their 4Kscore.

Figure 3
figure 3

Probability of Grade group 2 or higher disease on biopsy based on uCaP score estimated from the multivariable model with uCaP score, total PSA, age and result of digital rectal exam, (solid line; 95% confidence intervals depicted as dashed lines) overlaid on the distribution of uCaP score.

Discussion

We evaluated the ability of the 4Kscore to detect Grade group 2 or higher prostate cancer in an independent Danish cohort of TRUS biopsy, where we observed high discrimination (AUC = 0.763; 95% CI: 0.696, 0.829). Additionally, when adjusted for age, DRE, and total PSA levels, the urine microRNA model, uCaP had similar accuracy to predict Grade group 2 or higher prostate cancer in TRUS biopsy as the 4Kscore in patients with available uCaP score (AUC = 0.759, 95% CI: 0.680, 0.839; AUC = 0.758 95% CI: 0.682, 0.834; respectively). However, while there was no additional gain in precision by combining the two models (AUC = 0.766, 95% CI: 0.688, 0.844), both models were more accurate than the base model alone (AUC = 0.733; 95% CI: 0.661, 0.805), though this improvement was not statistically significant (test of equality of AUC between base model, 4Kscore, and uCaP model = 0.5).

The ability of the 4Kscore to improve the prediction of Grade group 2 or higher prostate cancer in biopsy naïve men over total PSA alone is in line with previous studies in other cohorts6,16,17 in terms of direction. The AUC found here is comparable to a large meta-analysis, which included data from 12 studies and close to 17,000 patients (AUC = 0.81 for detecting GG ≥ 2)18. However, the magnitude of the improvement over the base model was smaller as compared with other studies.

The study has potential limitations. The cohort we used here had higher risk than is commonly reported for biopsy cohorts6,15, with 34% of patients having a positive DRE prior to biopsy and nearly one fifth having GG 4 or 5. Among the 205 patients, 38% (95% CI: 31%, 45%) had Grade group 2 or higher cancer on biopsy. For comparison, the ProtecT cohort, upon which the coefficient for the 4Kscore was built, reported only 12.9% with Grade group 2 or higher cancer, and < 2% with GG 4 or 56. A possible explanation for this discrepancy could be that in the ProtecT cohort, men were invited for a PSA test (i.e. PSA screening), while the cohort in this study encompassed patients referred by their general practitioner upon clinical suspicion of prostate cancer. This is also reflected in the median PSA levels for patients with positive TRUS biopsy in the two cohorts, with 7.5 ng/ml here and 5.4 ng/ml in the ProtecT cohort.

More recently, multiparametric MRI (mpMRI) with MRI-targeted biopsies has shown improvement for prostate cancer detection over TRUS biopsy19,20,21. This is also reflected in the recently updated European guidelines for prostate cancer22, which now recommend mpMRI before biopsy. Consequently, future studies might include patients referred to mpMRI scans, to ensure the 4Kscore and uCaP remains useful in this setting as well. A recent study where the 4Kscore was combined with PiRADS score have suggested that this is indeed the case for the 4Kscore23. Nonetheless, a potential criticism of our study may be that we used as an endpoint Grade group 2 or higher prostate cancer on systematic biopsy, whereas many contemporary biopsies are done with MRI-guidance. We have two responses. First, MRI-targeted biopsy is far from universal. Due to lack of equipment and trained personnel, mpMRI is a very limited resource, unavailable at many centers, and still misses some prostate cancers that may be detected by the random TRUS biopsy approach5,24. Consequently, TRUS biopsy may be the preferred or only available option at many centers. Second, concerns have been raised that the apparently superior results of MRI may be an artifact of grade inflation25. There is evidence that cancers found by MRI-targeted biopsies are not, grade-for-grade, equivalent to those found on systematic. Specifically, high-grade cancers found only on targeted biopsy appear to be far less aggressive26. Hence our findings remain robust for the identification of clinically significant cancer. Similar considerations apply to template biopsies.

Conclusions

In conclusion, our findings provide further support for the clinical use of the 4Kscore to predict Grade group 2 or higher cancers in men being considered for biopsy. Promising results for uCaP warrant confirmatory research.