Main

In the western world, bladder cancer is the 4th most common cancer in men and the 10th in women.1 Most (70–80%) bladder cancers are non-muscle invasive at first diagnosis (pTa, pT1, carcinoma in situ).2, 3, 4, 5 Generally, the prognosis is good although 30–80% will recur and 1–45% will progress within 5 years.3, 4, 5, 6, 7 Stage and grade are the most important variables for progression as opposed to size and multiplicity, which are stronger predictors of recurrence.5, 6, 7 Historically, both three-tiered (1966-Bergkvist, 1973-WHO, 1998-WHO/ISUP and 2004-WHO) and four-tiered (1982-Modified Bergkvist and 1999-WHO) grading systems for bladder cancer have been described.5, 8

Point mutations in the FGFR3 gene are well documented in inherited skeletal anomalies associated with dwarfism, such as achondroplasia and thanatophoric dysplasia.9 An oncogenic role has been proposed for mutant FGFR3 in bladder cancer.10 Surprisingly, FGFR3 mutations were related to favorable bladder cancer with 84% of pTaG1 tumors having a mutation as compared with 7% of ≥pT2G3 lesions.11 In non-muscle invasive bladder cancer, FGFR3 mutations were associated with a low risk for progression12, 13, 14, 15 and genetically stable disease.16, 17 On the other hand, increased expression of the proliferation marker Ki-67, nuclear overexpression of P53 and decreased expression of the cell-cycle inhibitor P27 have been linked to higher grade and progressive non-muscle invasive bladder cancer.12, 14, 18 These four molecular markers are well-established representatives of the two major pathways in bladder cancer carcinogenesis.12

Currently, the simultaneous use of two classification systems for grade is advocated by the American Urological Association3 and European Association of Urology4 guidelines because the WHO2004 classification19 has not been sufficiently validated against the WHO1973 system20 with biological markers and follow-up. The main reasons to propose a new classification were the lack of clear definitions for the three WHO1973 grades and the high percentage of bladder cancer classified as G2 (the default diagnosis). The WHO 2004 classification system was proposed in 1998 by the WHO/International Society of Urological Pathology and adopted as a three-tiered (papillary urothelial neoplasms of low malignant potential (LMP), low-grade (LG) and high-grade (HG) urothelial cancer) grading system by the WHO in 2004.19, 21 First, the WHO2004 system has been based on the detailed histological criteria. Second, patients with LMP would have lower recurrence percentages and patients with LG non-muscle invasive bladder cancer would have close to no progression compared with HG cases. Consequently, the new classification led to a grade shift precluding a one-to-one translation between both grading systems: WHO1973 G1 tumors became WHO2004 LMP or LG, G2 became LG or HG and G3 are all HG (Figure 1).5, 19, 20, 21, 22, 23, 24, 25

Figure 1
figure 1

Comparison of the two (WHO 1973 and 2004) classification systems for grade in bladder cancer, which are both currently advocated by clinical guidelines. We hypothesized that the biological and clinical potential across the WHO1973 and 2004 systems would increase along the grades LMP, G1, LG, G2, HG and G3 as shown by the dashed arrows in the figure. Abbreviations: LMP, papillary urothelial neoplasia of low malignant potential; LG, low grade—papillary urothelial carcinoma; HG, high grade—papillary urothelial carcinoma; G1, grade 1; etc.; WHO, World Health Organization.

In the present study, we hypothesized that the biological potential and clinical course (recurrence and progression) across the WHO1973 and 2004 systems would increase as follows: LMP<G1<LG<G2<HG<G3 (Figure 1). To test this hypothesis, we performed clinical and molecular validation for both WHO classifications. Second, we performed clinical and molecular cross-validations between the two WHO classifications to explore differences between LMP, LMP/G1, G1/LG, LG/G2, G2/HG and HG/G3. This led to the proposal of a four-tiered classification system for grade.

Materials and methods

Patients

The study group consisted of 325 primary (first diagnosis) non-muscle invasive bladder cancers from two hospitals in Rotterdam, The Netherlands (Sint Franciscus Hospital: N=117 and Erasmus MC: N=121) and one hospital in Toronto, Canada (University Health Network, Princess Margaret Hospital: N=87) in the period 1986–2006. None of them had a hereditary skeletal disorder documented. Mean age at diagnosis was 66.4 years (standard deviation: 11.8 years). All patients underwent a macroscopically complete trans-urethral resection with muscle in the specimen. Random biopsies, a standard re-resection and a single instillation of chemotherapy after resection were not routinely done in the study era. The surveillance of the patients consisted of urethro-cystoscopy and cytology every 3–4 months in the first 2 years and at a lower frequency, every 6–12 months, if no recurrence was detected during follow-up. Upper urinary tract imaging was done every 2 years in high-risk cases or if indicated by clinical suspicion. Recurrence was defined as the development of histologically confirmed urothelial cancer in follow-up. Progression was defined as recurrent disease in the muscularis propria (≥pT2) and/or metastasis. Patients were followed until their last clinical visit, progression or death.

Pathology Review

We used newly cut or archived (4 μm), hematoxylin and eosin (H&E) stained histological sections from archival tumor tissue, which was fixed in 10% buffered formaldehyde, dehydrated and embedded in paraffin. Histological grade was reviewed by one uro-pathologist (TvdK) for the WHO1973 and 2004 classifications systems. This was done in two separate rounds. The pathologist was unaware of molecular and/or clinical data.

Immuno-Histochemistry

All cases were routinely processed in two laboratories (Erasmus MC, Rotterdam and Mount Sinai Hospital, Toronto). The monoclonal antibodies against Ki-67 (clone MIB-1), P53 (clone DO-7) and P27 (clone 1B4 (Rotterdam) or clone 57 (Toronto)) were used. Known positive and negative controls were included in each run. The slides were independently assessed by BvR/AV (Rotterdam) and by BvR/TvdK (Toronto). The cutoff levels were 25% (Ki-67), 10% (P53) and 50% (P27).12, 15 We used immuno-histochemical staining of whole slides for the Rotterdam cases and tissue microarray technology was used in Toronto.

FGFR3 Mutation Analysis

H&E slides served as templates for the manual micro-dissection procedure on the tissue-block. In case of multifocality, the papillary lesion with the highest grade/stage was taken. The largest tumor was taken if grade/stage was the same for multiple cancers. DNA was extracted using the DNeasy Tissue kit (Qiagen, Hilden, Germany). The FGFR3 mutation analysis was performed in two institutes (Erasmus MC, Rotterdam and Mount Sinai Hospital, Toronto) with the PCR-SNaPshot method.15 In brief, three regions (exons: 7/10/15) were simultaneously amplified by PCR. After removal of excess primers and deoxynucleotides, specific SNaPshot primers able to detect 11 known oncogenic FGFR3 mutations were annealed to the PCR products, separated by capillary electrophoresis and analyzed in an automatic sequencer.15 The codon numbering was according to the FGFR3b isoform expressed in epithelium.11

Statistics

The SPSS computer software version 21.0 was used for data documentation and analysis. Two-sided Fisher’s exact test was used to evaluate differences between molecular markers and the two WHO classification systems (molecular validation) and between categories of the two WHO systems (molecular cross-validation). The clinical outcome (recurrence and progression) in relation to tumor grade and the four molecular markers was analyzed using Kaplan–Meier statistics (clinical validation). Kaplan–Meier statistics were also applied to test differences in clinical outcome for a given grade category in one WHO system with two categories from the other WHO system (clinical cross-validation). Multivariable Cox regression analyses (backward stepwise method with P to enter <0.05 and P to remove >0.10) were performed to find independent prognostic variables among multiplicity, tumor size, carcinoma in situ, grade (WHO 1973, WHO 2004 and combined four-tiered grading system) and stage. A binary logistic regression model (backward stepwise method with P to enter <0.05 and P to remove >0.10) was constructed to predict stage (pTa or pT1) with the variables mentioned above. Multivariable testing was done with and without the molecular markers (FGFR3, Ki-67, P53 and P27) and by including each of the three classification systems for grade combined and separately. Statistical significance was assumed if P<0.05.

Results

The patient, tumor and molecular characteristics of the studied population are shown in Table 1. Figure 2 contains examples of the FGFR3 mutation analysis and altered expression of Ki-67, P53 and P27. Table 2 shows the head-on comparison between the WHO 1973 and 2004 classification systems for grade and stage. Data sets over time at 5-year intervals (1986–1990, 1991–1995, 1996–2000 and 2001–2006) did not show any differences related to pathological and clinical outcome (data not shown). As expected from Figure 1, the WHO 2004 system leads to grading into higher categories. This result was more pronounced in less differentiated non-muscle invasive bladder cancer.

Table 1 Patient, tumor and molecular characteristics
Figure 2
figure 2

Examples of PCR-SNaPShot (FGFR3 mutation analysis) and altered expression of Ki-67, P53 and P27. (a) Examples of the FGFR3 mutation analysis are shown. The top-example shows the wild-type gene with the codon numbering according to the FGFR3b isoform. Below are the two most common FGFR3 mutations in bladder cancer, ie, S249C and Y375C which we found in 126 (67%) and 30 (16%) tumors, respectively. The colors referring to specific nucleotides are as follows: G=blue, A=green, T=red and C=black. Orange is a size marker. (b) Examples of altered expression of Ki-67 (>25%), P53 (>10%) and P27 (<50%) are shown. FGFR3, fibroblast growth factor receptor 3 gene.

Table 2 (A) Comparison of histological grade according to the WHO 1973 and 2004 classification systems. Please note that one pathologist reviewed both classification systems in two separate sessions. (B) Comparison of stage according to the WHO 1973 and 2004 classification systems

Clinical Validation: WHO 1973 and 2004 and Molecular Markers

We determined recurrence and progression as prognostic end points. Recurrence was found in 219 (67%) patients and 63 (19%) patients progressed. Median and mean follow-up for recurrence were 2.1 and 3.4 years (interquartile range; 0.7–5.2 years), respectively. Median and mean follow-up for progression were 6.8 and 7.3 years (interquartile range; 3.8–9.3 years), respectively. Figure 3a and b shows that both WHO systems significantly predicted progression providing clinical validation. Conversely, neither system was able to predict recurrence (WHO 1973; P=0.441/WHO 2004; P=0.552). Figure 4a–d contains the Kaplan–Meier curves providing clinical validation of the FGFR3 gene status and Ki-67, P53 and P27 expression as prognostic markers for progression. The P-values were highly significant (P<0.001). FGFR3 mutations were related to favorable disease whereas altered expression of Ki-67, P53 and P27 was associated with worse outcome in NMI-BC. Of the four molecular markers, only Ki-67 (P=0.025) and P53 (P=0.031) expression significantly predicted recurrence.

Figure 3
figure 3

This figure shows the Kaplan–Meier curves providing clinical validation of the WHO 1973 (a) and 2004 (b) classification systems for grade to predict progression of non-muscle invasive bladder cancer. The P-values (log-rank) were highly significant (P<0.001) for both grading systems. The P-values (log-rank) comparing G1/G2, G2/G3, LMP/LG and LG/HG were =0.003, <0.001, =0.038 and <0.001, respectively. LMP, papillary urothelial neoplasia of low malignant potential; LG, low grade—papillary urothelial carcinoma; HG, high grade—papillary urothelial carcinoma; G1, grade 1; etc.; WHO, World Health Organization.

Figure 4
figure 4

This figure shows the Kaplan–Meier curves providing clinical validation of the FGFR3 gene status (a), Ki-67 expression (b), P53 expression (c) and P27 expression (d) to predict clinical progression of non-muscle invasive bladder cancer. The P-values (log-rank) were highly significant (P<0.001) for the four molecular markers. Please note that FGFR3 mutations were related to favorable disease whereas altered expression of Ki-67, P53 and P27 was associated with worse outcome. Ki-67, P53 and P27 expression was considered as altered if >25, >10 and <50%, respectively. FGFR3, fibroblast growth factor receptor 3 gene; mt, mutant; wt, wild type (no mutation).

Molecular Validation: WHO 1973 and 2004

The distribution of the molecular markers across the two WHO classifications is given in Table 3. In general, the frequency of FGFR3 mutations decreased with higher grades and the expression of Ki-67, P53 and P27 increased with higher grades. In addition, comparing WHO1973 with 2004, FGFR3 mutations were more frequent in HG (38%) than in G3 (18%). On the other hand, altered expression (Ki-67, P53 and P27) was more frequently found in G3 than in HG. Similar differences in molecular profile can be appreciated for LG and G2 but not for LMP and G1 (Table 3). We did not find a difference between laboratories and different antibody use (P27) in relation to pathological variables and clinical outcome (data not shown). Taken together, our results pointed to a significant stepwise increase in biological aggressiveness along the line LMP/G1, LG, G2, HG and G3.

Table 3 Molecular validation of the WHO 1973 and 2004 classification systems

Molecular Cross-Validation: WHO 1973 and 2004

The results from Table 3 prompted us to perform a cross-validation to investigate whether molecular differences can be found within one grade category itself for one WHO system by using the other WHO system. The molecular cross-validation of the WHO 2004 system with the WHO 1973 classification system is shown in Table 4A. When we tested the categories G1-2 vs G3 in the HG category, three of four molecular markers were significantly different. This implied that within the HG category, G3 has a more aggressive biological potential than cases graded G1-2. We also tested G1 vs G2-3 within the LG category. We found that two of the four molecular markers were significantly different (Table 4A). The molecular cross-validation of the WHO 1973 system with the WHO 2004 classification system is shown in Table 4B. Within the G2 category, three of the four molecular markers were significantly different for LMP-LG vs HG. Within the G1 category, none of 4 was different for LMP vs LG-HG. Taken together, the molecular cross-validations confirmed a significant stepwise increase in biological aggressiveness along the line LMP/G1, LG, G2, HG and G3.

Table 4A Molecular cross-validation of the WHO 2004 system with the WHO 1973 classification system
Table 4B Molecular cross-validation of the WHO 1973 system with the WHO 2004 classification system

Clinical Cross-Validation: WHO 1973 and 2004

Analogous to the molecular cross-validation of the two WHO systems, we also performed clinical cross-validations. As expected, no differences were found for recurrence. For progression, the same cross-validations that were significant in molecular cross-validations were also clinically significant. Figure 5a–c shows that within HG, LG and G2, the corresponding grades from the other WHO classification behaved differently. No difference was found within the G1 category (Plog-rank=0.186). In conclusion, these clinical cross-validations confirmed a significant stepwise increase in progression along the line LMP/G1, LG, G2, HG and G3 as suspected from the molecular cross-validations.

Figure 5
figure 5

This figure shows the Kaplan–Meier curves providing clinical cross-validation of the WHO 1973 and 2004 classification systems for grade predicting progression. (a) The groups HG—G1/2 and HG—G3 behave differently (Plog-rank=0.036). (b) Significantly different progression rates for LG—G1 and LG—G2/3 (Plog-rank=0.024) are shown. (c) A significant difference in progression for G2—LMP/LG and G2—HG (Plog-rank=0.013) is shown. Abbreviations: LMP, papillary urothelial neoplasia of low malignant potential; LG, low grade—papillary urothelial carcinoma; HG, high grade—papillary urothelial carcinoma; G1, grade 1; etc.; WHO, World Health Organization.

WHO 1973 and 2004 in pTa and pT1

We also performed the clinical and molecular (cross) validations in pTa and pT1 non-muscle invasive bladder cancer. None of the pT1 tumors was classified as LMP or G1. Therefore, cross-validations in pT1 were limited to the G2 and HG categories. In clinical validations, both WHO classifications were significant predictors for progression in pTa. In pT1, WHO1973 significantly predicted progression (Plog-rank=0.027) whereas WHO2004 did not (Plog-rank=0.098). In molecular validations for pTa and pT1, the same molecular markers (see Table 3) were significant. Molecular cross-validations showed significant differences for FGFR3, Ki-67 and P53 within the pT1-G2 category, for P53 within pT1-HG and for Ki-67 and P53 within pTa-LG and pTa-HG, respectively. Clinical cross-validations were significant for progression within pTa-LG (Plog-rank=0.041) and pTa-HG (Plog-rank=0.043), respectively. The subgroup analyses in pTa and pT1 confirmed the conclusions made in the paragraphs above.

Multivariable Analyses

Multivariable analyses were performed to compare the prognostic value of the different classification systems for grade. For these analyses, the four-tiered system was tested against the two three-tiered systems. Within the proposed four-tiered system, the LMP/LG/G1, LG/G2, HG/G2 and HG/G3 categories comprised 85, 72, 54 and 88 patients, respectively (Table 2A). Twenty-six cases (23 LMP/G2 and 3 HG/G1) were excluded from the multivariable analyses because these could not be assigned to one of the four-tiered categories. Multiplicity, tumor size, CIS, grade (WHO1973, WHO2004 and combined four-tiered grading system) and stage were entered in the multivariable models. In univariable analyses for recurrence, multiplicity, tumor size, carcinoma in situ, Ki-67 and P53 were significant. Of these variables, multiplicity and tumor size were significant in multivariable analysis (Table 5A). In univariable analyses for progression, multiplicity, carcinoma in situ, grade WHO 1973, grade WHO2004, the combined 4-tiered grading system, stage, FGFR3, Ki-67, P53 and P27 were significant (Figure 6). Of these variables, the combined four-tiered grading system and stage were significant in multivariable analysis (Table 5B). If only one of the classification systems for grade was entered in the multivariable analysis for progression, all systems (WHO1973, P=0.009; WHO2004, P=0.022; combined 4-tiered system, P=0.001) were significant. The addition of the four molecular markers (FGFR3, Ki-67, P53 and P27) as separate variables to the multivariable analyses to predict recurrence and progression did not change the outcome of these analyses. In univariable analyses to predict stage (pTa/pT1), multiplicity, tumor size, carcinoma in situ, grade WHO1973, grade WHO2004, the combined four-tiered grading system, stage, FGFR3, Ki-67, P53 and P27 were significant. In the logistic regression models (with and without molecular markers as variables) to predict stage (pTa/pT1), the combined four-tiered grading system (P<0.001) and P27 (P=0.028) were significant (Table 5C).

Table 5 Significant variables, hazard ratios and 95% confidence intervals in the multivariate analyses for the prediction of recurrence (A) and progression (B) are shown. For the prediction of stage (pTa/pT1) (C), significant variables, odds ratios and 95% confidence intervals in the logistic regression analyses are shown
Figure 6
figure 6

This figure shows the Kaplan–Meier curve for the four-tiered grading system based on the WHO 1973 and 2004 classification systems to predict clinical progression of non-muscle invasive bladder cancer. The P-value (log-rank) was highly significant (P<0.001). Please note that 26 cases (23 LMP/G2 and 3 HG/G1) were excluded because these could not be assigned to one of the four-tiered categories. LMP, papillary urothelial neoplasia of low malignant potential; LG, low grade—papillary urothelial carcinoma; HG, high grade—papillary urothelial carcinoma; G1, grade 1; etc.; WHO, World Health Organization.

Discussion

The WHO 2004 classification system was adopted to reduce observer variability and provide better prognostic information for non-muscle invasive bladder cancer patients. The main advantage of the WHO 2004 system over the WHO 1973 system is that histological criteria have been defined for each category.19, 23 One of the criticisms to the new system is that it has been accepted in spite of lack of clinical evidence and proper studies to assess reproducibility and prognostic value.22, 23, 24 So far, studies on observer variability have shown no benefit of the WHO2004 grading system.25, 26 Recurrence in LMP, which is considered as a benign lesion, varied between 25 and 60% in 11 studies23 indicating that LMP requires surveillance as low-risk non-muscle invasive bladder cancer.27 Our study with long follow-up also showed no significant difference in recurrence between LMP and LG or HG non-muscle invasive bladder cancer. Our smaller sample size and the notion that grade is a better predictor for progression than recurrence may explain the absence of significance for recurrence in our study.6, 23, 25, 28 Previously, progression was found more frequently in G3 compared with HG as the HG group is more heterogeneous.23 We confirmed this in a larger series with one pathologist reviewing both WHO classifications. Otto et al29 specifically analyzed the two WHO classifications in 310 patients with primary pT1 bladder cancer and found that the WHO 1973 system was superior to the WHO2004 classification. Holmäng et al30 re-graded 255 pTaG1 into LMP and LG. Their number of progressive cases (N=6) was too low to find a significant difference. The three-tiered WHO2004 system is therefore not superior to the WHO1973 classification from clinical point of view.14, 23, 25, 29 Consequently, management and/or follow-up schemes of non-muscle invasive bladder cancer patients have not changed.25, 27 One of the solutions may be a classification system for grade that includes elements from both WHO classification systems. Our study using clinical cross-validation confirms this concept as we found a significant stepwise increase in progression along the line LMP/G1, LG, G2, HG and G3.

Next to clinical comparisons, we also validated both WHO systems with molecular markers. Like for grade, molecular markers were more indicative for progression than for recurrence. As expected, we found that the frequency of FGFR3 mutations decreased with higher grades and altered expression of Ki-67, P53 and P27 increased with higher grades. Analogous to the clinical cross-validations, the cross-validations using the four molecular markers confirmed a stepwise increase in biological aggressiveness along the line LMP/G1, LG, G2, HG and G3. Only few studies that compare molecular markers with progression in both WHO1973 and 2004 classifications are available.14, 31 In general, FGFR3, Ki-67, P53 and CK20 were found to predict progression in both WHO classifications.14, 31 FGFR3 was also significant in the pT1 and HG subgroups14 and P53 in G2 non-muscle invasive bladder cancer.31 A systematic molecular cross-validation was not reported in these studies.

Considering studies that reported on molecular or clinical (progression) validation using WHO1973 and 2004,5, 14, 23, 25, 29, 30, 31, 32 it is striking that the WHO1973 grade was only reviewed in one previous study.29 We reported that the pathologist’s mean grade is constant (≤0.1 grade-point) in both WHO classifications whereas differences in mean grade among four pathologists were as high as 0.7 grade point.26 Of note, the mean grade of the WHO2004 classification was 0.4 grade point higher than the mean grade of the WHO1973 classification. Moreover, progression in each grade category highly depended on the pathologist’s mean grade.26 This indicates that (central) pathology review by one pathologist is crucial if both WHO systems are compared with each other in one study. This may also be the most important reason that previous studies have not systematically reported on cross-validation as we did. Therefore, our study provides unique and original molecular and clinical data on both current WHO classifications for grade.

For clinicians, the G2 category in WHO1973 and the HG category in WHO2004 are troublesome. Although earlier efforts to subdivide the WHO1973 G2 NMI-BCs into G2a and G2b resulted in useful prognostic information in a four-tiered classification,33 this subdivision did not gain widespread acceptance among pathologists. The WHO1999 classification subdivided HG into two separate categories8 but was also not accepted or universally used19 despite a recent editorial by Liedberg et al34 who made a plea using clinical and molecular data to distinguish between G2 and G3 as proposed in the WHO1999 classification. On the basis of molecular biology and clinical/multivariable data, our results support a four-tiered grading system using the 1973 and 2004 WHO classifications with one low-grade (LMP/LG/G1) category which includes LMP, two intermediate grade (LG/G2 and HG/G2) categories and one high-grade (HG/G3) category. In clinical practice and to avoid outliers like LMP/G2, we recommend to first grade a non-muscle invasive bladder cancer according to WHO2004 and subsequently according to WHO1973. All LMP should be graded as G1, the LG carcinomas as G1 or G2 and the HG carcinomas as G2 or G3.

The clinical recommendations in guidelines based on three-tiered grading systems are currently not clear for all risk groups.3, 4, 5 The proposed four-tiered system may help to improve treatment strategies for the G2 category in WHO1973 and the HG category in WHO2004. For example, LMP/LG/G1 may be managed by TUR and a single instillation of chemotherapy, LG/G2 by additional intravesical instillations of chemotherapy or BCG, HG/G2 by additional intravesical instillations of BCG and in case of HG/G3, early radical cystectomy may be considered. In this context, it is also important if the pathologist is a high or low non-muscle invasive bladder cancer grader (high or low ‘mean’ grade).26 However, the four-tiered system may also suffer from reduced reproducibility compared with the three-tiered grading system. It is therefore important that pathologists and urologists closely work together to ensure optimal outcomes for non-muscle invasive bladder cancer patients.