Main

Clinically insignificant prostate cancer (PC) can be defined as a cancer, which will not affect the patient during the natural course of his lifetime. The indolent course of localised low-grade PC means that active surveillance (AS) is considered a good treatment option (Albertsen et al, 2005). Recent data regarding the apparent overtreatment of screening detected PC has provoked much debate regarding the way low-risk PC is treated. Generalisation of conclusions from studies where PSA screening is common to populations where PSA testing is relatively uncommon is misleading.

Attempts have been made to define clinically insignificant PC, based on pathological examination of radical prostatectomy specimens and analysis of recurrence rates of these tumours. Several definitions have been formulated. Work by Stamey et al (1993) identified organ-confined tumours of <0.5 cm3 Gleason 3+3 with no grade 4 or 5 as insignificant. Subsequently, Wolters et al (2011) analysed the data from the European Randomised Study of screening for PC (ERSPC) (van den Bergh et al, 2009) to redefine insignificant PC as organ-confined Gleason 3+3 tumours, with no grade 4 or 5, the largest tumour having a volume ⩽1.3 cm3 and a total tumour volume of ⩽2.5 cm3. Evidence of tumour volume as an independent predictor of outcome in PC is conflicting, and therefore we considered definitions of insignificant PC that include and exclude tumour volume.

A number of tools have been developed to identify men who are harbouring insignificant PC. The tools are based on parameters defined by clinical examination, serum PSA, transrectal ultrasound findings and biopsy results. These tools have been generated in populations where PSA screening is common from USA (Soloway et al, 2010; Adamy et al, 2011; Tosoian et al, 2011; Whitson et al, 2011) and from the ERSPC study (van den Bergh et al, 2009). In the UK, the uptake of PSA testing is ∼6% (Melia et al, 2004; Parker et al, 2006). We sought to evaluate the accuracy of these tools in identifying insignificant PC (by three different definitions) in a cohort of patients undergoing radical prostatectomy.

Objectives. To evaluate the accuracy of tools designed to identify insignificant PC in a cohort of patients undergoing radical prostatectomy in an unscreened population.

Second, we sought to evaluate the effect of PSA screening on the accuracy of these tools.

Materials and Methods

Data were collected prospectively for 847 patients undergoing robotic radical prostatectomy between July 2007 and October 2011 at Addenbrookes Hospital, Cambridge, UK. The PSA level was measured and clinical stage assigned by the attending urologist according to the 2002 TNM staging system. All patients had their biopsy and radical prostatectomy specimens evaluated at our institution by genitourinary pathologists according to protocols from the Royal College of Pathologists (2009).

Data were reviewed retrospectively and 445 patients who had Gleason 3+3 on biopsy were identified. Of these, in 415 patients, a specialist uropathologist had estimated tumour volume in the radical prostatectomy specimen at the time of initial histopathological evaluation using a visual inspection method of assessment of marked areas of tumour on whole-mount haematoxylin and eosin stained sections from the entirely embedded prostates (van der Kwast et al, 2011).

The histology sections were divided in halves or quadrants to fit cassettes for paraffin embedding. Pathological tumour stage and Gleason grade were assessed. Tumour areas were marked on each slide and measured by visual estimation, giving a percentage of the total volume of prostate occupied by tumour (van der Kwast et al, 2011). Subsequently, the volumes of tumour were calculated by multiplying the percentage occupied area by the gland volume (itself calculated by measurement of diameters of resected prostate specimen H Ă— W Ă— L Ă— Ï€/6 (MacMahon et al, 2009)). For 537 patients, the weight of the prostate specimen with seminal vesicles removed was available. Where diagnostic TRUS was performed at the referring centre, we were not able to calculate PSA density using TRUS volume. TRUS-estimated volume was available for 256 patients. In these patients, we used the calculated volume of the resected prostate to calculate approximate PSA density. Strong agreement was seen between prostate weight and calculated prostate volume, and between TRUS-estimated volume and calculated prostate volume (data not shown).

Postoperative assessments included physical examination and PSA at 6 weeks, 3, 9 and 12 months and every 6 months thereafter. Biochemical recurrence was defined as PSA >0.2 ng ml−1.

Five criteria designed to identify insignificant disease were identified (Table 1) and performance compared with D’Amico low-risk criteria (PSA⩽10, Gleason 3+3, cT1-2a) (D'Amico et al, 1998). In addition, the discriminative power of the selection criteria used for a UK AS cohort (Selvadurai et al, 2013) and the effect of the number of biopsy cores (Ochiai et al, 2005) containing cancer were evaluated.

Table 1 Components of tools by which to identify patients bearing insignificant PC

Three different criteria were used to define insignificant PC:

  1. 1

    Classical definition=organ-confined tumours of <0.5 cm3 Gleason 3+3 with no Gleason 4 or 5 (Stamey et al, 1993).

  2. 2

    ERSPC definition=organ-confined Gleason 3+3 tumours, with no Gleason grade 4 or 5, index tumour volume ⩽1.3 cm3 and a total tumour volume of ⩽2.5 cm3 (Wolters et al, 2011).

  3. 3

    Inclusive definition=organ-confined Gleason 3+3 tumours, with no Gleason grade 4 or 5.

Receiver operator characteristics (ROC) were plotted for each of the tools in predicting insignificant disease by each of the three definitions and the area under the ROC curve (AUC) calculated. Comparisons with the D’Amico tool were made using the DeLong method (DeLong et al, 1988).

The D’Amico risk stratification tool was not designed to facilitate selection of patients for AS but as a means to predict outcome following radical therapy (D’Amico et al, 1998). It is included here as a benchmark for comparison with tools designed for this purpose. Its use as a predictor of outcome in other treatment modalities has been taken up in the UK (Graham et al, 2008).

Sensitivity, specificity, positive predictive value and negative predictive value were calculated to assess the performance of each tool under each of the three criteria. We also considered two additional tools: ‘any criteria’, which saw the patient recorded as eligible for AS if any one of the tools indicated he was low risk, and ‘all criteria’, which saw the patient recorded as eligible for AS only if all of the tools indicated he was low risk.

The tools for identifying patients suitable for AS were examined using univariable Cox regression. Unless otherwise stated, P-values were calculated based on two-sided hypothesis tests and are corrected for multiple comparisons using a Bonferroni correction. A cutoff of 0.05 was used to determine statistical significance. Statistical analysis was performed using Stata 11 (StataCorp LP 2009, College Station, TX, USA).

Results

Of the 847 patients treated with robotic radical prostatectomy, 415 had Gleason 3+3 disease on their diagnostic biopsy and tumour volume estimated after prostatectomy. Of these, 206 patients had D’Amico low risk disease PC (D'Amico et al, 1998). It has been shown that the number of cores on diagnostic biopsy was affected by cancer correlates with tumour volume (Ochiai et al, 2005). Only 63 patients undergoing surgery had two or fewer cores containing cancer on diagnostic biopsy (all biopsies had 10–12 cores taken) and these men had expressed a strong desire not to pursue AS after careful counselling about the relative merits of AS.

Two hundred and nine out of 415 (50.3%) had Gleason 4 or 5 PC, 131 out of 415 (31.6%) demonstrated extra-capsular extension and 1 out of 415 (0.2%) had positive nodes at surgery, a rate of misclassification in keeping with other non-USA series (Kim et al, 2013).

The selection criteria evaluated vary in their stringency. The more stringent criteria render fewer patients eligible for AS, although the number of patients offered AS for significant disease is decreased. Table 2 demonstrates this principle with Toisiain criteria selecting only 3 of 44 (7%) patients with pT3 tumours for AS and the Whitson criteria 39 of 169 (23%) of eligible patients who had pT3 disease. The less-stringent criteria tended to select patients bearing larger tumours. The receiver operator characteristics are shown in Tables 3 and 4. No significant difference in discriminative power is demonstrated for any tool over D’Amico low-risk criteria other than a significant inferiority for the Selvadurai criteria in identifying insignificant disease by the ERSPC definition.

Table 2 Pathological tumour characteristics in patients identified as bearing insignificant PC by the selection tools described
Table 3 Comparison of the diagnostic accuracy of selection tools in identifying patients bearing insignificant PC defined in three different ways
Table 4 Comparison of the areas under the receiver operator characteristic curves of the selection tools

From a pragmatic point of view, the urologist needs to know how many people identified as having insignificant PC by each criteria will turn out to be harbouring significant disease. Table 5 demonstrates the stringency and discriminative power of each tool.

Table 5 Number of patients identified correctly and incorrectly as bearing insignificant PC and therefore being eligible for active surveillance

Iremashvili et al (2012) compared accuracy of the five tools in predicting insignificant PC in a cohort of North American patients. Table 6 demonstrates the relative accuracy of these tools in our and the Iremashvili cohorts. The AUC did not exceed 0.7 for any of the tools evaluated in either the UK or USA cohorts. Consensus suggests that a test with an AUC of <0.7 is of limited discriminative power (DeLong et al, 1988). In the USA data, the AUCs approached 0.7 for some of the tools predicting insignificant PC by each of the definitions. In the UK cohort none of the AUCs exceeded 0.6.

Table 6 Comparison of accuracy of tools in unscreened UK and screened USA data sets

One limitation of this study is that this cohort of surgical patients may not necessarily be representative of all patients with Gleason 3+3 PC on biopsy. The small number of classically defined insignificant PC (n=18) seen limits our ability to analyse this subgroup.

Discussion

The recent publication of the PIVOT trial data has provoked much debate (Wilt et al, 2012). PSA screening is commonplace in the USA (for example, in the PLCO randomised study of PSA screening, at enrolment 44% of the men in the ‘unscreened’ arm and this increased to 77% after 5 years on the study (Andriole et al, 2009)). In the UK, PSA testing is performed on an ad hoc basis and it is estimated that only 6% of men will have had a PSA test (Melia et al, 2004; Parker et al, 2006). In the USA, the PIVOT study failed to demonstrate a difference in overall survival for men with low-risk PC treated with radical surgery or observation. Conversely, the SPCG-4 study, from an unscreened Scandinavian population, demonstrated a significant survival benefit to radical prostatectomy (Bill-Axelson et al, 2011). Analyses of PIVOT trial data compared with SPCG-4 data suggest that PSA screening results in over diagnosis and lead-time effect (Xia et al, 2013). It has been estimated using the ERSPC data that this lead time is 11 years (Draisma et al, 2003). Assuming that tumours grow with time, it is logical that screen-detected PCs will be of lower volume than their non-screen-detected counterparts as demonstrated by our data. Examination of ERSPC data reveals a median tumour size of 0.47 cm3 (Draisma et al, 2003). Only 4% of our cohort had tumour size ⩽0.47 cm3 with a median tumour volume of 3.47 cm3 (see Figure 1).

Figure 1
figure 1

Bar chart of distribution of tumour volumes in screened and unscreened populations (we acknowledge technical differences in the method of calculation of tumour volume).

The accuracy of a diagnostic test can be evaluated by calculating the AUC. The AUC for a random act, like a coin toss, is 0.5. The accepted rule of thumb is that an AUC of <0.7 indicates poor discriminative power; between 0.7 and 0.8 indicates acceptable discrimination; and >0.8 indicates excellent discriminative ability (DeLong et al, 1988).

In 2012, Iremashvili et al (2012) published data examining the accuracy of protocols for selecting patients for AS in a USA cohort. The AUCs ranged from 0.61 to 0.68 for classically defined insignificant PC, 0.59 to 0.64 for updated definition and 0.55 to 0.62 for organ-confined low-grade disease. This compares with 0.51 to 0.60, 0.54 to 0.57 and 0.54 to 0.59, respectively, in our cohort. None of the selection tools evaluated had an AUC of >0.7 in either cohort. There was a trend towards the AUCs being lower in the UK cohort compared with corresponding USA data. We hypothesise that tools based on number or length of biopsy cores are rendered inaccurate due to greater variability in the size of PCs detected.

Two or fewer cancer-containing cores has been shown to correlate with tumour volume in US series (Ochiai et al, 2005). Selecting only patients with two or fewer cores on biopsy or very low-risk disease (D’Amico low risk with two or fewer cores effected) did not improve accuracy in detecting insignificant disease over D’Amico risk stratification alone (Tables 3 and 4).

The recent recommendations of the US preventative task force on PC screening (Moyer, 2012) and the AUA (Carter et al, 2013) may result in changes in the uptake of PSA screening worldwide with potential effects on the accuracy of these tools.

Data from maturing AS cohorts suggest the short-term safety of an inclusive approach (Klotz, 2013; Selvadurai et al, 2013). However, 30% of patients on AS undergo radical treatment within 5 years. Upgrading and upstaging at surgery were common events in our cohort. It is possible that with increasing follow-up the number of patients failing conservative therapy will increase and while they are on AS they continue to age, rendering them less suitable for surgical treatment.

Considered together these data suggest a substantial underdiagnosis and would justify exhaustive investigation early in the course of an AS programme. The approach in our unit, in an attempt to minimise underdiagnosis, following a diagnosis of low-risk PC is to perform a multiparametric MRI with repeat TRUS at 3 months. The MRI is performed after a suitable period following the diagnostic biopsy to exclude false positives due to biopsy artefact. If the MRI demonstrates a lesion likely to have been missed on TRUS-guided biopsy, targeted transperineal biopsy is performed. Inter-operator variability in interpreting MRI means that in some centres the use of systematic transperineal template biopsy is rational (Ayres et al, 2012). Where expertise in MRI is available, targeted biopsy may help rule out aggressive disease for those men who are unwilling to accept the inaccuracy associated with staging and grading of PC by TRUS biopsy, PSA and clinical examination.

This study highlights the uncertainty associated with selecting men for AS; although long-term data from AS series are awaited, it is clear that there is an urgent need for a biomarker by which to differentiate indolent from aggressive PC. Heterogeneity within PC tumours means that the sampling error inherent to random or systemic needle biopsy of the prostate will continue to result in misdiagnosis. This error is unlikely to be negated using genomic or expressome analysis of tumour samples, where sampling error is still a problem. Further work should focus on imaging tests where areas suspicious for aggressive cancer can be identified and targeted for biopsy. Multiparametric magnetic resonance imaging is currently the best way to identify significant PC. In the future, positron emission tomography using tracers taken up specifically by aggressive tumour foci may prove useful.

Conclusions

Tools to identify insignificant PC are inaccurate in an unscreened population. This illustrates the great caution needed when using a tool developed and validated in one population in another population, particularly when PSA screening is a confounding factor. In counselling patients for AS, the surgeon should be explicit regarding uncertainty in predicting stage/grade despite apparent short-term safety. There is an urgent need for development of a means by which to exclude aggressive PC in patients wishing to undergo conservative treatment.