Introduction

Globally, prostate cancer (PCa) is the second most frequent cancer and was the fifth leading cause of cancer death in men in 2020, with an estimated 1.4 million new cases and 375,000 deaths worldwide 1. Generally, radical prostatectomy (RP) is a valid treatment method for localized PCa 2. Elevated prostate serum antigen (PSA) levels are the most sensitive and specific early indicator of PCa recurrence after RP. Biochemical failure (BF) is defined as persistent, detectable PSA levels after RP (i.e., persistent PSA) or two consecutive PSA level increases of 0.2 ng/mL or more after a period of PSA normalization (i.e., biochemical recurrence). This scenario occurs in 30–40% of patients within 10 years after RP, and is associated with poorer cancer-specific outcomes 3.

Multi-parametric magnetic resonance imaging (mpMRI) is a highly sensitive tool for detecting clinically significant PCa 4,5. The approach also detects adverse pathological features in PCa patients, such as extracapsular invasion or lymph node metastasis 6,7,8,9,10. Some studies have reported that bi-parametric MRI (bpMRI), in contrast to mpMRI without dynamic enhancement, demonstrated a similar PCa diagnostic accuracy as mpMRI 11. Also, bpMRI is highly cost-effective when compared with mpMRI, and helps with diagnostic processes and risk stratification in PCa patients 12. In our study, we evaluated the added value of bpMRI for BF prediction in PCa patients. We developed and validated a pre-surgical model, which included bpMRI parameters and clinical variables, to predict BF.

Methods

Research and validation cohorts

From June 2018 to January 2020, we retrospectively and consecutively analyzed 967 patients who underwent prostate bpMRI and RP. Exclusion criteria: (1) Patients who did not receive standardized MRI scans or underwent MRI scans at other centers; (2) Patients on neoadjuvant therapies; (3) Patients with transurethral resection of the prostate; (4) Patients who received hormone or radiotherapy after RP before BF; and (5) Patients with insufficient clinical data and undetermined PSA results from postoperative follow-up. Finally, 446 patients met our inclusion criteria (Fig. 1). Patients were then randomized into research (n = 335) and validation cohorts (n = 111) at a 3:1 ratio. All detected lesions were evaluated and classified according to PI-RADS v 2.1 guidelines 13. If a patient had multiple lesions in the same PI-RADS category, the lesion with the largest diameter was taken as an exponential lesion.

Figure 1
figure 1

Calibration plot showing mean predicted risk in the validation cohort. (A) Calibration plot of the baseline model, (B) Calibration plot of the MRI model.

BpMRI protocols

BpMRI was performed using a 3 T MRI system (Verio, Siemens, Germany), involving only T2WI and DWI (b value = 2000s/mm2), which were the dominant sequences used to characterize transitional and peripheral zones, respectively 13. The prostate volume was measured by bpMRI. All lesions were evaluated by senior personnel using PI-RADSv2.1 scores. The prostate MRI regional model was defined using the following four-zone method. To trisect the prostate along its axis, the lower third was defined as the apex zone while the upper third was the basal zone. The middle third was further divided into peripheral and non-peripheral zones. According to the four-zone method, a positive zone was defined as the major part of the lesion located or a lesion involved more than half of the zone. Therefore, patients with multiple lesions may also have multiple positive zones. Also, extracapsular extension (EPE) and seminal vesicle invasion (SVI) indices were recorded.

Prediction model design

The baseline model embodies commonly used clinical variables comprising age at biopsy, body mass index (BMI), PSA at diagnosis, PSA density, suspicious digital rectal examination (DRE) (yes/no), biopsy pathology (ISUP grade), and surgical technique type (Robot-Assisted Radical Prostatectomy or Laparoscopic Radical Prostatectomy). The MRI model included these predictors, plus PI-RADS scores (1, 2, 3, 4, and 5), EPE at bpMRI (yes/no), SVI at bpMRI (yes/no), the zonal location of suspected lesions (apex region, basal region, central peripheral zone, and central non-peripheral zone), maximum diameter of the suspected lesion, and clinical stage (T1, T2, and/or T3). The outcome was BF. Postoperative PSA levels were initially measured at 1–2 months after RP, then at 3 month intervals in the second year, and intervals exceeding 6 months were deemed lost to follow-up.

Statistical analysis

We developed and validated two multivariable logistic regression models to predict BF after RP. We recalibrated the risk model in the validation cohort by matching logistic regression with the logit of the predictive risk 14. A calibration slope near 1 indicated the correct predictive model fitting. The diagnostic correctness of both models was surveyed and balanced by the area under the curve (AUC) of the receiver operating characteristic (ROC). Model fitting was evaluated using calibration plots 14. False positive rates (FPR) and true positive rates (TPR) were used to evaluate the prediction accuracy of postoperative BF. The TPR indicated the ratio of patients with BF above the threshold, while FPR indicated the proportion of patients with non-BF values above the same threshold. The clinical value of the prediction model was weighed using the ratio of avoided BFs, the net benefit (NB), and a net reduction (NR) in false positives (FPs) 15.

We analyzed 95% confidence interval (CI) and SE values of prediction ability estimator in every predictive models, and the difference between the two models which from 2000 samples by stochastically selecting patients with substitution. We readjusted the prediction model and recalculated the prediction risk of every model in every sample in the research cohort. The 95% CIs came from 2.5% and 97.5% of the re-sampling distribution. Data for the resampling process included outcome (whether there was postoperative BF) and the unregulated predicted risk analyzed according to every risk models in the validation cohort. In every sample, the simple model for recalibration was readjusted, and then the predicted risk after calibration was recalculated. We compared variable distributions between research and validation cohorts. Categorical variables were assessed using χ2 tests, and we used Wilcoxon tests to analyze continuous variables. These tests were bilateral and a P < 0.05 value indicated statistical significance.

Ethical approval and consent to participate

All methods were performed in accordance with relevant guidelines and regulations. This retrospective study received ethical approval from the Hospital Ethics Committee of the First Affiliated Hospital of Nanjing Medical University. Written informed consent was obtained from all subjects.

Results

Study population

In accordance with our exclusion criteria, we finally selected 446 consecutive patients. Then, we randomly divided 335 patients into the research cohort and 111 patients into the validation cohort, and both separately included in the model. Patient demographics in both cohorts are shown in Table 1. In research (median [inter-quartile range (IQR)] age = 69 [63–74] years) and validation cohorts (median [IQR] age = 69 [64–74] years), the postoperative BF incidence was 22.39% (n = 75) and 27.02% (n = 30), respectively. When compared with the validation cohort, age at biopsy, BMI, PSA, abnormal DRE, PI-RADS v2.1 category, ISUP grade, and surgical technique in the research cohort were similar.

Table 1 Patient Demographics of Research and Validation Cohort.

The MRI characteristics for both cohorts are shown in Table 2. The research cohort had a similar zonal location of the index lesion, maximum diameter of the index lesion, MRI EPE, seminal invasion, and clinical stage when compared with the validation cohort (P > 0.05).

Table 2 MRI Characteristics of Research and Validation Cohort.

The prediction model

In the baseline model, PSA, GG3, GG4, and GG5 were independent predictors in terms of clinical variables, with statistical significance in the MRI model (Table 3). The risk for BF was positively associated with PSA and increased with GG3, GG4, GG5, and lesion in the central peripheral zone. In research and validation cohorts, the calibration plot showed that the MRI model demonstrated a better fit when compared with the baseline model (Fig. 1).

Table 3 Logistic Regression Prediction Models of Biochemical Failure for Research Cohort.

When compared with the baseline model, the AUC increased from 0.780 to 0.857 (P < 0.05) in the MRI model in the research cohort (Fig. 2A and Table 4). In the validation cohort, when compared with the baseline model, the AUC increased from 0.753 to 0.865 (P < 0.05) (Fig. 3A and Table 5).

Figure 2
figure 2

Plot showing the performance metrics of the research cohort. (A) Receiving operating characteristic curves or risk prediction models for CS prostate cancer, (B) TPR and FPR, (C) Net benefit (%), (D) Net reduction in false‐positives (%) of the three risk prediction models.

Table 4 Performance of the two Risk Prediction Models in the Research Cohort.
Figure 3
figure 3

Plot showing the performance metrics of the validation cohort. (A) Receiving operating characteristic curves or risk prediction models for CS prostate cancer, (B) TPR and FPR, (C) Net benefit (%), (D) Net reduction in false‐positives (%) of the three risk prediction models.

Table 5 Performance of the two Risk Prediction Models in the Validation Cohort.

TPR and FPR values in models are shown in Fig. 2B for the research cohort. TPR and FPR values in calibrated risk models (Table 4) are shown in Table 5 and Fig. 3B for the validation cohort. The FPR of the MRI model was lower when compared with the baseline model, and the loss of TPR was the smallest.

Decision curve analysis (DCA)

Figures 2C, D showed the NBs and NRs in the quantity of FPs for the research cohort, and Fig. 3C, D showed the NBs and NRs in the quantity of FPs for the validation cohort. We then applied the MRI model to the validation cohort. When compared with “treat all” and “treat none” methods (“all model” and “none model”), the NB of risk thresholds ≥ 15% was always higher for all models (Figs. 2C and 3C). For instance, at a 20% risk cut-off, the NB was 3 (95% CI: 0–9) in both models, 14 (95% CI: 7–23) in the baseline model, and 18 (95% CI: 11–28) in the MRI model, and the NR in the quantity of FPs was 0 in the “all model (treat all)”, 19 (95% CI: 6–37) in the baseline model, and 32 (95% CI: 0–56) in the MRI model. The NB of the MRI model was identical to 18 BFs/100 men with-out negative BFs, four more than the baseline model. When compared with BFs in all patients with positive MRI results, the NR in the quantity of FPs based on the MRI model was equivalent to 32 fewer false BFs/100 men, while the quantity of undiagnosed BFs did not increase. Overall, 66% (95% CI: 53%–90%) of “treat all” could be avoided, while 83% (95% CI: 63%–94%) of postoperative BFs were identified. In contrast, the baseline model avoided 53% (95% CI: 33%–76%) of "total treatment" at this threshold, and identified 87% (95% CI: 75%–100%) of postoperative BFs under this threshold.

Discussion

With the emergence of different treatments for localized PCa, the preoperative risk stratification of PCa patients is extremely important. BF is an ideal early prognostic PCa predictor after RP. A previous study reported that BF occurred when tumor tissue residue at surgery (i.e., positive margin and/or subclinical lymphatic metastasis) or cancer had disseminated beyond the prostate and outside the surgical field at surgery (i.e., minimal residual disease) 16,17.

Several commonly used multivariate risk tools based on pre-diagnosed PSA, T stage by DRE, and biopsy grading group categories have been used to predict postoperative PSA results 18,19. Several studies reported that MRI-derived parameters in a risk model increased the accuracy of BCR prediction. For example, a multivariable model including MRI PIRADS, along with clinical and pathological variables, outperformed European Association of Urology classification and CAPRA scores for predicting BCR (C-index: 77% vs. 62% vs. 60%, respectively) 20. Moreover, in another study 8, a pre-surgical model incorporating PI-RADS, fusion-targeted biopsy grade, and extraprostatic extension on MRI showed better accuracy in predicting BCR (AUC = 0.68–0.71) when compared with the D’Amico classification (AUC = 0.66–0.71). However, these findings used BR as the endpoint, and persistent PSA levels (> 0.2 ng/ml) after RP also required preoperative intervention. In Soga et al., three sub-groups were defined in terms of the D’Amico classification risk (low, intermediate, and high) and the GP score (Gleason score multiplied by PSA). No significant difference was observed in the non-BF rate between low risk and low GP score subgroups or intermediate risk and intermediate GP score subgroups. But the non-BCF rate of the high GP score subgroup was significantly lower when compared with the high-risk subgroup (42.1% vs. 66.1%, P = 0.008). Based on multivariate analyses, a high GP score (P = 0.001; Hazard ratio (HR): 3.78; 95% CI: 1.95–7.35) was a significant independent risk factor for BCF after prostatectomy. However, these prediction models were limited to clinical parameters 21. In previous studies, Teloken et al., reported that transition zone location indicated a better BR-free survival after adjusting for poor clinicopathological features 22. Shin et al., showed the zonal location of lesions by MRI, and in addition to the PI-RADS category, this was putatively helpful estimating postoperative BF risks 9. These studies confirmed the role of MRI in predicting BF, but they did not develop prediction models. When MRI parameters were included in our prediction model, we identified better model fitting and a higher diagnostic accuracy, avoided more BFs, and maintained a similar level of sensitivity to BFs in contrast with the baseline model.

We used DCA in both risk prediction models to compare the NBs of “treat none” with “treat all”. “Treat none” refers to RP for localized PCa, while “treat all” refers to neoadjuvant androgen deprivation, extended radical operation, and lymph node dissection. In clinical settings, the risk threshold of “treat all” may be determined after physicians and patients weigh and judge the relative hazards of aggressive treatment regimen and the benefits of determining postoperative BFs. So, there was no one risk threshold in deciding who demanded RP, but a series of risk thresholds. Because of higher adverse-effect profiles and the disputed curative effects of “treat all”, we selected high risk thresholds for our DCA. Our novel MRI model also demonstrated better calibration characteristics and higher NBs when compared with the baseline model. Our DCA data indicated that when index lesion locations on bpMRI were included in the prediction model, it showed better model fitting and a higher predictive accuracy, thereby decreasing unnecessary treatments while increasing BF sensitivity when compared with the baseline model.

Study limitations

Our model data were similar to previous data. However, our study had several limitations; it was a retrospective, single center data study, and was internally validated. In addition, this study was based on bpMRI, which may have some bias compared with multi-parameter MRI. These factors may have caused some verification bias and the data may not be universally applied 23.

Conclusions

Using preoperative clinical and MRI-related variables, we developed and verified a MRI-based prediction model which predicted BF incidence in patients after RP. This model could be helpful in clinical settings.