In response to the News & Views article by Carlsson and Kattan (Personalized risk — stratified screening or abandoning it altogether? Nat. Rev. Clin. Oncol. 13, 140–142 (2016))1, we would like to thank the authors for their acknowledgement of, and positive remarks on, the Stockholm 3 (STHLM3) study, in which we were involved2. We agree with their view that a blanket rejection of prostate-specific antigen (PSA)-based screening for prostate cancer is ill-advised and would lead to reduced opportunities to prevent death from prostate cancer. Neither do we believe such a rejection to be practically feasible. Clearly, the way forward is to improve our approach to prostate-cancer screening to permit early and accurate diagnosis of disease in men who need treatment, and to avoid overdiagnosis and unnecessary biopsies in those who do not. In light of this fundamentally important aim, we would like to add clarification on a few points raised by Carlsson and Kattan regarding the STHLM3 study.

First, Carlsson and Kattan1 questioned the applicability of the STHLM3 model in the clinical setting, in which men with elevated serum PSA levels are subject to additional workup before deciding on whether to perform a biopsy. The aim of STHLM3 was to develop a tool to improve high-volume screening in the primary-care setting, building on the findings of the European Randomised Study of Screening for Prostate Cancer (ERSPC)3. Thus, the rational decision was to use PSA ≥3 ng/ml as a comparator in the STHLM3 study, in order to infer the same mortality reduction as that observed using this cutoff in the ERSPC. Nevertheless, because further workup in men with elevated levels of PSA is currently common practice, the authors' remark deserves attention, and we will address this issue in a forthcoming publication, in which we compare results from using the STHLM3 model for biopsy recommendations to current clinical practice in Stockholm, Sweden.

Second, we disagree with Carlsson and Kattan1 regarding the failure of the STHLM3 investigators to address whether the genetic score — based on 232 single-nucleotide polymorphisms associated with prostate cancer — included in the STHLM3 model adds predictive value. Pepe et al.4 have reported that demonstrating statistical significance as an independent predictor in a multivariable analysis is sufficient evidence of the value of a biomarker; such evidence is provided for the genetic score in Table 2 of the STHLM3 study publication by Grönberg et al.2 Testing additionally for an improvement in the area under the curve (AUC) would be redundant and, therefore, unnecessary4.

Third, Carlsson and Kattan1 noted that calibration of the STHLM3 model was not reported by Grönberg et al.2 We argue that 'discrimination' (that is, the ability to discriminate between cases and controls) is the most-important property of a classification model: a poorly calibrated model with high discriminatory power is highly useful, whereas a well-calibrated model with poor discriminative performance is of limited value. Moreover, poor calibration can always be fixed, provided enough data are available5. Having said that, we agree that a well-calibrated predictive model is desirable; Fig. 1 shows the excellent calibration of the STHLM3 model.

Figure 1: Calibration plot of the STHLM3 model for predicting high-risk prostate cancer.
figure 1

The graph shows the calibration of the model — that is, the agreement between the predicted and observed risk of high-risk prostate cancer (Gleason score ≥7) — based on the results from the 5,344 biopsies performed in the STHLM3 validation cohort. The red line indicates perfect correspondence between predicted and observed risk (perfect calibration) and the black line shows the calibration of the STHLM3 model. The orange shaded area indicates the 95% confidence interval, and the tick lines above the x-axis shows deciles of the risk distribution, each representing one tenth of the population. The graph was produced using the R language and the gbm package10,11.

PowerPoint slide

Fourth, Carlsson and Kattan1 point out correctly that the disease prevalence in the overall STHLM3-study population remains unknown, as biopsies were not performed in all participating men. For ethical and practical reasons, performing biopsies in men with low PSA levels was not deemed appropriate, a feature the STHLM3 study shares with virtually all other prostate cancer diagnostic studies. For example, the Prostate Health Index (PHI) and the 4KScore have been validated as reflexive tests in cohorts of men with increased PSA levels (usually defined as a serum PSA concentrations above 2–4 ng/ml)6,7,8,9, making it difficult to infer that reductions in prostate-cancer mortality observed with these tests are equivalent to those associated with PSA screening using 3 ng/ml as a cutoff for biopsy. STHLM3 is, to our knowledge, the only prospective prostate cancer diagnostic study that demonstrates prevented biopsies and decreased overdiagnosis, without decreasing the detection of high-grade tumours.

Finally, we agree with Carlsson and Kattan's1 view that informing doctors and patients about the individual probability of high-risk prostate cancer on a continuous scale, rather than according to risk group, could be relevant for clinical decision-making. In the ongoing clinical implementation of the STHLM3 model, the individual's risk of having a prostate cancer with a Gleason score ≥7 is reported to the doctor who ordered the test. Many patients (and, indeed, doctors) find it difficult, however, to conceptualize the risks and prefer a clearly stated recommendation on the appropriate course of action.

We hope that these clarifications address the questions posed by Carlsson and Kattan1 on the performance characteristics of the STHLM3 model.