Main

The prostate cancer diagnostic pathway is very different to that of almost all other solid organ cancers, in that it is calibrated to detect subclinical disease but often misses clinically important disease (Shaw et al, 2014). The imprecision comes from the transrectal ultrasound-guided biopsy (TRUS biopsy) – a semirandom deployment of needles into the prostate – that is the standard recommendation to a man with an elevated PSA. Men in whom diagnostic uncertainty remains unresolved often require repeat biopsy (Abrahams et al, 2015).

Multiparametric magnetic resonance imaging (mpMRI) incorporates a number of imaging sequences that assess anatomy and tissue characteristics such as cellular density and vascularity (Valerio et al, 2014; Siddiqui et al, 2015; Turkbey et al, 2016). It could be used as a triage diagnostic test by identifying those men who might avoid a repeat prostate biopsy (Bossuyt et al, 2006). The Prostate Imaging Compared to Transperineal Ultrasound-guided biopsy for significant prostate cancer Risk Evaluation (PICTURE) trial (Simmons et al, 2014) was designed to overcome methodological limitations related to the use of either TRUS biopsy or radical prostatectomy as reference standards, with the former being inaccurate and the latter incorporating selection biases as men had to both test positive for cancer on a TRUS biopsy and then choose to undergo surgery. It is likely that these inherently different populations (contingent on the method of histological verification) harbour different burdens of disease.

The PICTURE trial was a paired-cohort validating confirmatory study designed to provide level 1b evidence on the diagnostic accuracy of mpMRI in men who required further biopsies (Phillips et al, 2009; Valerio et al, 2015). In other words, our study comprised men who had tested either negative on a first TRUS biopsy and had some indication for a repeat evaluation or tested positive and required some form of reclassification. Our reference standard was transperineal template prostate mapping biopsies (TTPM biopsies), which are both accurate and avoid many of the described biases. Transperineal template prostate mapping biopsies can be applied to almost all men under evaluation and overcome the random error of TRUS biopsy by sampling the whole prostate every 5 mm (Crawford et al, 2013).

Materials and methods

The PICTURE trial was a single-centre, ethics committee-approved, registered validating confirmatory study reported to STARD (Bossuyt et al, 2003). The full details of our protocol have been published (Simmons et al, 2014). Ethics committee approval for the study was granted by London City Road and Hampstead National Research Ethics Committee (reference 11/LO/1657) and the trial was registered on 6 December 2011. The study opened to recruitment on 11 January 2012 and completed recruitment on 29 January 2014.

Eligibility

Men were eligible for the study if they had undergone prior TRUS biopsy and were advised to undergo further biopsies as part of standard care.

Index test

All eligible men underwent the index test (mpMRI) using a 3 T magnetic field strength scanner with a pelvic-phased array coil. Magnetic resonance imaging sequences included T1-weighted, T2-weighted, diffusion weighting with high b-value (b=2000) sequence and apparent diffusion coefficient map using multiple b-values (b=0, 150, 500, 1000) and dynamic contrast enhancement with gadolinium (Magnevist). Complete sequence details can be found in our protocol publication (Simmons et al, 2014). Multiparametric MRIs were reported by an expert urologic radiologist with over 5 years of experience in interpreting prostate MRIs. The radiologist was blinded to previous TRUS-biopsy results, but given the PSA level and any other risk factors. Reporting was carried out using a 5-point Likert scale for the likelihood of the presence of clinically significant disease: ‘1’ and ‘2’ were designated for prostates ‘highly unlikely’ and ‘unlikely’ to harbour clinically significant prostate cancer, respectively, scores of ‘4’ and ‘5’ for glands ‘likely’ and ‘highly likely’ to harbour clinically significant disease, and a score of ‘3’ for glands in which the likelihood of the presence of clinically significant cancer was equivocal. This scoring system was based on the outputs of a consensus group (Dickinson et al, 2011) that was convened before the publishing of the PIRADS mpMRI reporting consensus (Barentsz et al, 2012), although the two systems have subsequently been found to be similar (Rosenkrantz et al, 2013; Rastinehad et al, 2015). A random selection of 50 cases was re-reported by a second expert radiologist to allow assessment of interobserver variability.

Reference test

Patients were blinded to the mpMRI results to minimise non-compliance and selection bias. Men underwent the reference test (TTPM biopsies) performed according to a set protocol regardless of the imaging findings and without image registration regardless of the mpMRI scoring. In summary, mapping using 5 mm sampling was obtained using core needles inserted via a brachytherapy grid fixed on a stepper. In most prostates, two biopsies at each grid point were required to sample the full craniocaudal gland length. All biopsies were reported by one of two expert uropathologists of >20 years of experience each who were blinded to the mpMRI reports. All negative biopsies were double-reported for quality control. The cancer core length (CCL) was reported as the actual amount of cancer seen in each core without counting the intervening areas of benign glands.

Target condition

As it was inappropriate to use histological criteria for clinical significance developed for TRUS biopsy, disease significance was defined by criteria developed and validated for use with TTPM biopsies (Ahmed et al, 2011). Our primary outcome was based on the presence of dominant Gleason pattern 4 or greater (i.e., Gleason 4+3) or a CCL involvement of 6 mm in any one location of any Gleason score. We used definition 2 for secondary outcome analyses (Gleason 3+4 and/or CCL 4 mm) as well as the presence of any Gleason score 7 or more.

Sample size calculation

The sample size calculation was performed for the primary objective of calculating the negative predictive value (NPV) of mpMRI, using a precision-based estimate (Flahault et al, 2005). Targeting an NPV of 90% for definition one disease, for a binomial 95% confidence interval (CI), with a confidence width 10%, the number of patients needed with an absence of clinically significant prostate cancer on the reference test was 139. Assuming a prevalence of 38% for UCL definition one disease in the population of interest based on data at our centre, which we have recently published but available at the time of designing PICTURE (Valerio et al, 2016), and assuming the performance characteristics of mpMRI equate to sensitivity and specificity of 70%, an overall sample size of 316 patients was needed. As the prevalence of men without clinically significant disease on the reference test was not precisely known for the PICTURE study cohort, an interim analysis at 114 recruited men permitted an adjustment in recruitment to ensure that at least 139 men with a negative reference test were available for analysis.

Statistics

Sensitivity, specificity, PPV and NPV were calculated for all eligible men with binomial 95% CIs. The index test was regarded as positive for an mpMRI score of 3 or greater for the purpose of the primary outcome and on a score of 4 or greater, as well as other definitions of clinical significance on the TTPM biopsies. Interobserver agreement was assessed using absolute and weighted kappa and proportion of agreement and assessed using area under receiver-operating characteristic (AUROC) curves. The weighted versions allow for the magnitude of the disagreements to be taken into account. The weighting system used resulted in the weights 0.75, 0.5, 0.25 and 0 for MRI ratings scores that differed by 1, 2, 3 and 4, respectively. STATA version 11.0 software was used for all analyses with any tests of significance using two-sided P=0.05 as the threshold for statistical significance.

Results

Baseline characteristics

Three-hundred and thirty men were enrolled, and following 81 withdrawals, we had 249 completing both mpMRI and TTPM biopsies (Figure 1, STARD flowchart). Men eligible for analysis had mean (s.d.) age 62 years (7), median (IQR) PSA 6.8 ng ml−1 (4.8–9.8) and median (IQR) number of previous biopsies 1 (1–2) and gland size 37 ml (26.8–50.0) (Table 1). One hundred and twenty-one (48.6%) had Gleason 6 disease on TRUS biopsy, while 52 (21.1%) had low volume Gleason 7 disease; 76 (30.5%) had no prior cancer. At TTPM biopsies, a median (IQR) 49 (40–55) cores were taken. Two hundred and nine of 249 (84%) in total had cancer on TTPM biopsy. The number of men free from clinically significant cancer on TTPM biopsies was 146 of 249 (59%) and thus meeting our predefined sample size assumptions (Table 1).

Figure 1
figure 1

PICTURE trial flowchart compliant with STARD.

Table 1 Patient demographics

Primary outcomes

When using mpMRI score of 3 as a positive test result, 214 (86%) had a positive prostate mpMRI. For definition one clinically significant prostate cancer, sensitivity was 97.1% (92–99), specificity 21.9% (15.5–29.5), NPV 91.4% (76.9–98.1) and PPV 46.7% (35.2–47.8; Figure 2). Overall accuracy, as assessed by AUROC, was 0.74 (95% CI 0.68–0.80).

Figure 2
figure 2

Bar chart with associated contingency table demonstrating the histological outcome on TTPM biopsies for each MRI score when using the primary definition of clinical significant prostate cancer (Gleason 4+3 and/or maximum CCL 6 mm).

Using mpMRI score 4 as a positive test result, 129 (51.8%) had a positive mpMRI. For definition of one disease, this conferred sensitivity of 80.6% (71.6–87.7), specificity of 68.5% (60.3–75.9), NPV of 83.3% (75.4–89.5) and PPV of 64.3% (55.4–72.6) (Figure 2). The negative likelihood ratio was 0.13 (0.04–0.42) and 0.28 (0.19–0.43) for an mpMRI score threshold of 3 and 4, respectively. Figure 2 illustrates histological outcomes on TTPM biopsies for each mpMRI score.

Secondary outcomes

We considered two scenarios using mpMRI to avoid a repeat biopsy. If an mpMRI score of 3 defined a positive test, this would potentially allow 35 (14%) to avoid a biopsy with 89 of 214 (41%) clinically insignificant cancers (definition 1) detected (overdiagnosis) and 3 of 35 (9%) clinically significant cancers missed (underdiagnosis). If a score of 4 define a positive mpMRI, 120 (48%) might avoid a biopsy with 40/129 (31%) clinically insignificant cancers detected (overdiagnosis) and 20 of 120 (17%) clinically significant cancers missed (underdiagnosis; Table 2). When considering definition 2 clinically significant cancers, the probability of underdiagnosing this type of significant cancer increases to 1 in 3 (11 of 35) if men with an mpMRI reported as 1 or 2 wish to avoid a biopsy. Further, this would be 1 in 2 (54 of 120) if men with an mpMRI reported as 1, 2 or 3 wish to avoid biopsy (Table 3).

Table 2 Number of men avoiding biopsy and diagnosis rates for a target definition for significance on TTPM-biopsy of definition 1 (Gleason 4+3 and/or maximum cancer core length 6mm) for each MRI score
Table 3 Number of men avoiding biopsy diagnosis rates for significance on TTPM-biopsy of definition 2 (Gleason 3+4 and/or maximum cancer core length 4mm) for each MRI score

Agreement on the subset of mpMRIs that were double read was 58% (n=29 of 50) with K=0.41 (s.e. 0.08), giving moderate agreement. Weighted agreement was 87.0% (K=0.52, s.e.=0.10) indicating good agreement. When comparing mpMRI scores for each reporter to histology on TTPM biopsies, there were minimal differences between each reporter in terms of AUROC analyses (reporter one AUROC 0.76 (0.63–0.90) vs reporter two 0.75 (0.61–0.89)).

In detecting and ruling out definition of two clinically significant prostate cancer (Gleason 3+4 and/or CCL 4 mm of any Gleason score), with an mpMRI score of 3 as a positive test result, sensitivity was 93.5% (88.6–96.7), specificity 29.6% (20.0–40.8), NPV 68.6% (50.7–83.1) and PPV 73.4% (66.9–79.2; Figure 3). Overall accuracy was AUROC 0.76 (0.70–0.82). Table 3 presents scenarios for number biopsied and outcomes if mpMRI score 3 or 4 were used to designate a positive test.

Figure 3
figure 3

Bar chart with associated contingency table demonstrating the histological outcome on TTPM biopsies for each MRI score when using the secondary definition of clinically significant prostate cancer (Gleason 3+4 and/or maximum CCL 4 mm).

There were no serious adverse events resulting from mpMRI. Serious adverse events resulting from TTPM biopsies occurred in 9 (3.6%). Adverse events were assessed in 236 in a median of 38±56 days after biopsy. Haematuria was reported in 220 (93.2%), poor urine flow in 108 (45.8%) and urinary retention in 56 (23.7%). Urinary tract infection was diagnosed in 23 (9.8%) and perineal skin infection in 8 (3.4%). Rectal pain was reported in 59 (25.1%), perineal pain in 95 (40.3%) and perineal bruising in 136 (57.6%). De novo erectile dysfunction occurred in 20.8%, with two men requiring oral medication and the others recovering erectile function spontaneously after 3–6 weeks.

Discussion

Our PICTURE trial results show that mpMRI in men who require repeat biopsies is able to accurately rule out clinically significant prostate cancer as shown by a high sensitivity and NPV. If men with an mpMRI reported as 1 or 2 wish to avoid a biopsy the probability of significant cancer is 1 in 10. In our study, this amounted to two cases with 6 mm of Gleason 3+3 and a third case of 2 mm of Gleason 4+3.

Our study has some limitations. First, the proportion scoring 1 or 2 was small (14%) leading to low specificity. Second, our findings relate to an expert centre and whether these findings are reproducible in other non-expert centres requires further evaluation. Third, we were not able to report to the PIRADS system due to this protocol being set up before the PIRADS reporting schema. A future study will need to compare the two reporting systems. Last, our overall accuracy demonstrated by the AUROC value of 0.74 is somewhat <0.80 value that is widely accepted to be indicative of an optimal diagnostic test. This is due to the poor specificity of mpMRI and further reinforces that when suspicious, mpMRI cannot replace biopsy due to a high rate of false positives. It also alludes to the consequent detection of insignificant cancer, especially when a score threshold of 3 is used to designate a suspicious mpMRI.

Recent systematic reviews assessing the diagnostic accuracy of mpMRI found sensitivities ranging from 58 to 96%, specificity 23 to 87% and NPV 63 to 98% (de Rooij et al, 2014; Fütterer et al, 2015). The wide ranges reflected differences in mpMRI protocols, reference standards, study populations, disease prevalence and mpMRI reporting.

Our study relates to a heterogeneous patient population who had previous biopsy that was positive or negative. We mitigated any bias this might cause by our blinding strategy. Further, heterogeneity improves external validity, although we would advise caution in applying our results to the biopsy-naive population. The role of mpMRI in biopsy-naive men is subject to another study called PROMIS that has been reported recently (Ahmed et al, 2017).

A number of definitions of clinically significant prostate cancer are available that could have been used to define the target condition on the reference test (Lord et al, 2011; Valerio et al, 2016). We decided to use histological thresholds developed to stratify TTPM-biopsy outcomes. Other classification systems such as the commonly used Epstein criteria are based on using maximum CCL and the number of positive cores from TRUS biopsy and cannot be applied to TTPM biopsies.

The prevalence of clinically significant disease was high in PICTURE, which might be related to the fact that large glands (>80 ml) were unable to enter the study as TTPM biopsy would not be possibly due to bony public arch interference. However, such a high prevalence has been seen by others when applying TTPM biopsies to this group of men (Bittner et al, 2015); thus, it is possible that existing thresholds for clinical significance might need to be raised (Bratt et al, 2015; Valerio et al, 2016).

Currently, men who require a repeat prostate biopsy face either a further TRUS biopsy or TTPM biopsies. Some are using urinary or serum biomarkers to decide who should proceed to a repeat prostate biopsy. Transrectal ultrasound-guided biopsy can continue to missclassify with men sometimes requiring a third or fourth biopsy (Abraham et al, 2015). In addition, TRUS biopsies carry risk of infection (1–4%) and rising levels of life-threatening sepsis (0.1%) as they traverse contaminated rectal mucosa with most men experiencing discomfort and bleeding (Loeb et al, 2013).

On the other hand, TTPM biopsies are highly accurate (Crawford et al, 2005) and accurately attribute prostate cancer risk (Valerio et al, 2016). A recent study (Crawford et al, 2013) comparing TTPM biopsies with whole-mount radical prostatectomy specimens it detected in all but one significant prostate cancer lesion – our protocol used the same 5 mm sampling frame in this study. However, TTPM biopsies require general anaesthesia with a side-effect profile that is higher than that of TRUS biopsy. The degree of morbidity, as assessed robustly in PICTURE, is high. The other disadvantage of TTPM biopsies is the risk of overdetection of clinically insignificant cancers (Valerio et al, 2016).

Serum and urinary biomarkers also hold promise and offer the potential of a noninvasive simple test that might allow men to make a decision whether to avoid a repeat biopsy (Leapman et al, 2016). However, two biomarkers (prostate health index and PCA3) were recently deemed less accurate and less cost-effective when compared with an imaging-based pathway by the UK National Institute of Clinical and Health Excellence (NICE Diagnostics Guideline, 2015). We acknowledge other fluidic biomarker panels such as the 4-kallikrein panel have demonstrated good performance characteristics and are undergoing further evaluation.

It is against these options that the role of mpMRI is considered. A man who is currently advised to undergo a repeat biopsy is faced with the alternatives of an inaccurate test, which confers a risk of sepsis (TRUS biopsy) compared with a highly accurate test that requires a general anaesthetic and confers other forms of morbidity but lower risk of sepsis (TTPM biopsies). He and his physician, upon looking at the performance characteristic of mpMRI in an expert centre, may wish to use this before a decision about repeating the biopsy.

Whether the use of mpMRI before biopsy might be cost-effective requires further research (Willis et al, 2014; Cerantola et al, 2016). With an estimated one million prostate biopsies occurring every year in the United States and 300 000 men undergoing repeat biopsies, the upfront costs of an mpMRI triage test need to be offset against the potential benefit of 30 000 fewer biopsies and fewer cases of clinically insignificant cancer, which often get treated unnecessarily. Nonetheless, one should not underestimate the issues of cost, skills and expertise in reporting mpMRI and carrying out targeted biopsies (Nassiri et al, 2015).

Conclusion

In men currently advised to have a repeat prostate biopsy, prostate mpMRI could be used to safely avoid a repeat biopsy in 14% while detecting 97% of clinically significant prostate cancers. However, men with a non-suspicious mpMRI who avoid an immediate biopsy should be told of the false-negative rate associated with such a strategy and undergo clinical follow-up. In addition, the high prevalence of suspicious mpMRI scans, when a score of 3 is used to indicate a positive mpMRI leading to biopsy, can also lead to overdiagnosis of insignificant cancers. Further research is required to determine whether targeted biopsies in conjunction with systematic biopsies or alone can achieve the high sensitivity seen in our PICTURE study.