Main

Renal cancer is the third most frequent malignancy of the urinary tract and accounts for 3% of all adult malignancies (Cohen and McGovern, 2005). Localised disease can be cured with surgery in most cases. Nevertheless, 50% of patients with renal cell carcinoma will present with or develop metastatic disease (Flanigan et al, 2004; Cohen and McGovern, 2005). In this case, prognosis remains poor and 5-year life expectancy is <20% (Kavolius et al, 1998; Flanigan et al, 2004).

Recent advances in our understanding of the biology of RCC and especially the role of angiogenesis in the development and expansion of this tumour led to the development of novel vascular endothelial growth factor (VEGF)-targeting therapies (Motzer et al, 2007; Escudier et al, 2007a, 2007b), which proved to be superior to the previous standard, interferon (IFN). Sunitinib is an inhibitor of the split-kinase-domain family of receptor tyrosine kinases (including -VEGF) (Chow and Eckhardt, 2007). It has been established as first-line treatment for advanced RCC, following the results of a randomised phase III trial, which showed a significant advantage IFNa in progression-free survival (PFS) (Motzer et al, 2007). In spite of this undisputed benefit, the prognosis of advanced RCC remains poor, while the toxicity of sunitinib (as well as that of other novel agents) is considerable (Bhojani et al, 2008). There is, therefore, a need to select patients likely to benefit from these therapies.

In the era of targeted therapies, specific prognostic algorithms are necessary for clinical trial design, patients counselling and treatment decisions. Until recently, the most widely used prognostic model is that of the MSKCC, which uses five factors: LDH, Karnofsky performance status (KPS), time from nephrectomy, calcium levels and haemoglobin levels, which have all been associated with independent prognostic significance (Motzer et al, 1999, 2001; Negrier et al, 2005; Escudier et al, 2007c). The combination of these factors led to the development of a prognostic model including three patient groups with statistically significant and clinically relevant differences in survival (Motzer et al, 2001). The MSKCC model has been used for the design of all phase III trials using modern therapies. Nevertheless, there may be limitations associated with its use in this context. It was developed with patients undergoing treatment with cytokines, while the prognosis of patients with metastatic renal cell carcinoma (mRCC) with targeted therapies has been considerably improved. All randomized studies mainly included patients of low or intermediate risk, that is, populations with different composition than that of those used to develop the MSKCC model. Analyses in contemporary series have not confirmed all factors in the model as significant, while other factors with independent prognostic significance have been suggested (Motzer et al, 2009; Beuselinck et al, 2011; Karakiewicz et al, 2011). Furthermore, certain studies suggested that novel models may perform better than the MSKCC in the targeted therapy era (Karakiewicz et al, 2011). Finally, this model is fairly complex, requiring two clinical and three biochemical factors making its application for retrospective analyses somewhat problematic. Information regarding prognostic factors in the targeted therapy era is limited and heterogenous (Choueiri et al, 2007; Motzer et al, 2008; Karakiewicz et al, 2011). One (Motzer et al, 2008; Karakiewicz et al, 2011) or more (Choueiri et al, 2007) agents were used, while PFS (and not overall survival (OS)) was the endpoint (Choueiri et al, 2007; Motzer et al, 2008; Karakiewicz et al, 2011). Finally, in two of these studies (Motzer et al, 2008; Karakiewicz et al, 2011) only patients participating in a clinical study were included, thus not being representative of the population treated in everyday practice. None of these models has been externally validated. For these reasons, their accuracy, performance characteristics and impact on clinical decisions remain unknown.

The most accepted model developed in the targeted therapy era is the mRCC IDC model (Heng et al, 2009). The data have been derived by a large (645 patients) multinational database including patients treated with first-line anti-VEGF therapy. Six factors (KPS, time from nephrectomy, calcium, haemoglobin, neutrophils and platelets) were used to identify three prognostic groups. This model has been recently externally validated and compared favourably with four other models, which, however, had all been developed in the cytokine era (Heng et al, 2013). Nevertheless, the median follow-up was relatively short (16 months) and the model is more complex than that of MSKCC. In addition, treatment was heterogenous with three agents used. Importantly, one of them, sorafenib, can be considered as suboptimal first-line therapy, as, unlikely the other two agents, sunitinib and bevacizumab, it has not shown superiority to IFN in this setting (Escudier et al, 2009).

We have recently used the advanced RCC database of the Hellenic Co-operative Oncology Group (HECOG) to study prognostic clinicopathological factors in patients treated with sunitinib. In an initial analysis of 109 patients, we identified PS, time from diagnosis and number of metastatic sites as independent predictors of survival (Bamias et al, 2010). The combination of these factors led to the development of a prognostic model with similar performance with that of the more complex MSKCC model. We are now reporting an updated analysis of 170 patients and external validation of this model. We also studied the performance of the IDC model in our development and validation cohorts.

Materials and Methods

Patient population

The development cohort included 170 consecutive patients with mRCC from nine Greek centres treated between October 2005 and December 2010. The validation cohort included 266 consecutive patients treated at three French and one Belgian centre between November 2005 and January 2012 (Table 1). The larger part of this database has been used in a previous analysis of prognostic factors for PFS and OS (Beuselinck et al, 2011). The analysis was approved by the Institutional Review Board of the participating institutions and informed consent for the use of medical data for research purposes was obtained.

Table 1 Participating centres (number of patients, %) in the development and validation cohorts

Criteria for inclusion in this analysis included diagnosis of mRCC and treatment with sunitinib. Previous IFNa but not anti-VEGF therapy was allowed. Baseline demographic, clinical and laboratory data with prognostic significance according to published reports and the authors’ experience were retrospectively collected from medical charts using uniform database templates to ensure consistent data collection. Overall survival data was available for all patients. The databases were updated in January 2012 before the final analysis.

Statistical analysis

All analyses were carried out in STATA/SE 11.2.

Description of data and model construction

Patients’ characteristics were presented through means, medians and proportions. Overall survival was the primary variable and was defined as the time interval between the date of first cycle of sunitinib and the date of death from any cause; patients not dead were censored at the date of last contact. Survival curves were estimated using the Kaplan–Meier method. Factors that were considered for their prognostic ability included: age (60 vs >60 years), sex (female vs male), Eastern Co-operative Oncology Group (ECOG) PS (1 vs 0), time from diagnosis to treatment with sunitinib in months (12 vs >12), number of metastatic sites (>2 vs 0–2), tumour grade (III+IV vs I+II), nephrectomy (no vs yes), previous IFNa (no vs yes), histology (clear cell vs other), alkaline phosphatase (abnormal vs normal), LDH (abnormal vs normal), calcium (>10 mg dl−1 vs 10 mg dl−1), platelets (>400 × 103 per mm3 vs 400 × 103 per mm3), haemoglobin (13 g dl−1 for males or 11.5 g dl−1 for females vs >13 g dl−1 for males or >11.5 g dl−1 for females), neutrophils (>5000 per mm3 vs 5000 per mm3), WBC (>10 000 per mm3 vs 10 000 per mm3), liver metastases (yes vs no), brain metastases (yes vs no), bone metastases (yes vs no) and lung metastases (yes vs no).

For some patients in the development cohort, laboratory data (platelets, neutrophils, WBC, haemoglobin, calcium, LDH, ALP) and data on other variables (tumour grade, age, histology, liver metastases, lung metastases, bone metastases and brain metastases) were missing. To account for the missing values, we employed multiple imputations using the Markov Chain Monte Carlo method for arbitrary missing data. The variables used to generate imputed data were number of metastatic sites, PS, time from tumour diagnosis, sex, previous IFNa, previous nephrectomy and survival status.

The associations of each of the above indicated factors (after multiple imputations) with OS were assessed through hazard ratios estimated from univariate Cox proportional hazards models. Factors for which the hazard ratios were statistically significant at the level of significance 0.2 after multiple imputations were then included in a multivariate Cox proportional hazards model. The final predictive model included only those variables for which the corresponding estimated hazard ratios were statistically significant at the level of 5% (P<0.05).

After the final model was defined, patients were classified into risk groups in two ways: four groups on the basis of the actual number of prognostic factors that remained in the final model; and three groups defined as good, intermediate and poor risk on the basis of the 25th and 75th percentiles of the model’s prognostic index risk score distribution. The former classification is familiar in the clinical setting, whereas the latter methodology has been suggested in recent studies (Royston et al, 2010).

Internal and external validation and calibration of the model

ROC curves and bootstrap-corrected Harrell’s C-index were used to assess the model’s discriminatory ability (Pencina and D’Agostino, 2004) in the development cohort (internal validation). The C-index was estimated by bootstrapping with 200 resamples to estimate an unbiased measure of the ability of our predictive model to discriminate among patients in the development cohort with respect to their death/survival.

External validation was performed by calculating a risk score for each patient in the validation cohort using the prognostic factors and the respective Cox regression coefficients of the model as estimated in the development data set. Patients were stratified according to their risk of death in the same way as in the development cohort, but using the distribution of risk scores in the validation data set. The model’s discriminatory ability in the validation cohort was checked with the C-index.

The model was recalibrated using the method described by Miller and Hui (1991), as this applies in the Cox PH model. According to this method, the need to include in the model a slope for the prognostic index is checked, and if so, the recalibrated model is used to estimate survival probabilities for subjects in the validation cohort.

The predictive ability of the model (with or without calibration) was checked by plotting the observed and the predicted survival curves for the indicated risk groups in the development and in the validation data sets.

Results

Baseline characteristics of the development cohort

The baseline characteristics of the patients in the development and the validation cohorts who were included in the analyses are detailed in Table 2. Median follow-up for the development cohort was 35.51 months and for the validation cohort 37.55 months. During follow-up, 103 patients of the development cohort (61%) and 151 (57%) of the validation cohort died. The median OS in the development and validation data sets was 19.4 months (95% confidence interval (CI) 15.1–24.7) and 26.1 months (95% CI 20.4–31.9), respectively. Significantly more patients in the validation cohort had undergone nephrectomy, had previously received IFNa, had clear cell histology, were more than 12 months from diagnosis of RCC, had normal LDH and had more than two metastatic sites. In addition, more patients of the validation cohort were categorised in the favourable and intermediate risk groups according to the MSKCC and IDC models.

Table 2 Baseline characteristics of patients in the development and validation cohorts

Construction of the predictive model

Model selection on the basis of the development cohort is shown in Table 3. Male sex, PS 0, >12 months from diagnosis to sunitinib initiation, 0–2 metastatic sites, previous nephrectomy, normal LDH, alkaline phosphatase and calcium and platelet count 400.000, lack of anaemia and absence of bone, lung or brain metastases were associated with improved survival in univariate analysis. Our final model (shown in the last two columns of Table 3) included the three factors, which were found significant in multivariate analysis (in order of significance): number of metastatic sites, ECOG PS and time from diagnosis to sunitinib. Check of the proportionality assumption revealed no violation (P-value=0.392).

Table 3 Model selection through univariate and multivariate Cox models in the development cohort

Risk stratification in the development cohort

The prognostic index from the model was estimated for each patient as the sum of the variables included in the final model multiplied by the log of the respective HRs (Table 3). Low values of the index indicate lower probability of death. Patients were classified according to their risk of death in four groups identified by the number of risk factors of the final predictive model that were present in a patient: no factors, any one factor, any two factors and all three factors; and in three groups (good, intermediate and poor risk) by splitting the index values at 0 (25th percentile) and 1.544 (75th percentile). Figure 1A and B (solid lines) show the observed survival curves according to either classification scheme.

Figure 1
figure 1

Observed (solid lines) and predicted (dashed lines) overall survival for the development data set by risk classification according to: (A) number of risk factors and (B) according to percentiles of the prognostic index.

Internal validation

Internal validation of the model regarding its discriminatory ability resulted in a C-index of 0.709. After bootstrapping with 200 resamples, the corrected C-index was 0.712 indicating good discriminatory performance of the model, in that subjects with longer predicted survival times also had longer actual survival. The discriminatory ability of the model is also demonstrated in Figure 1A and B (dashed lines): predicted survival curves are very close to observed survival curves.

External validation

External validation was accomplished by applying the log of the HRs shown in Table 3 (7th column) to each patient in the validation data set to calculate the prognostic index. Risk groups were formed as in the development data set but the cutoffs for the 25th (0) and 75th (1.543) percentile were derived from the distribution of the prognostic index in the validation data set. The C-index from this model in the validation data set was 0.634, indicating that the model did not have as high discriminatory ability in the validation as in the development cohort.

In Figure 2A and B, the observed and predicted survival curves according to the two classification schemes are shown for the validation data set. Predicted survival was similar to the observed in the best prognosis groups of both classifications but deviated in the other risk groups with longer observed survival being longer than the predicted. This is also evident in Figure 3A and B where the observed survival is plotted for the development (solid line) and validation data sets (dashes lines).

Figure 2
figure 2

Observed (solid lines) and predicted (dashed lines) overall survival by risk classification according to the number of risk factors (A, C) or the percentiles of the prognostic index (B, D) for the validation data set before (A, B) and after (C, D) recalibration.

Figure 3
figure 3

Observed overall survival for the development (solid lines) and validation data sets (dashed lines) by risk classification according to number of risk factors (A) or percentiles of the prognostic index (B).

When the model was recalibrated, the inclusion of a slope for the prognostic index was deemed statistically significant (P<0.001) and the magnitude of the slope was 0.553 (s.e.=0.117). Therefore, the calibrated prognostic index was further used to predict survival in the validation data set. The improvement in the prediction of the model is depicted in Figure 2C and D, where the deviation between observed and predicted survival in the validation data set has been decreased compared with the prior-to-calibration analysis. Table 4 provides a summary of survival according to risk classification in the development and validation cohorts (after recalibration).

Table 4 Survival according to risk stratification in the development and validation cohorts

As our model showed lower discriminatory ability in the validation compared with the development cohort, we also evaluated the performance of the model proposed by IDC, in the development and validation cohorts. In this way, the IDC model was indirectly compared with our predictive model. That is, we estimated a prognostic index on the basis of parameters and respective estimates given by Heng et al (2009), and we estimated predicted survival on the basis of this prognostic index in both data sets. Eastern Co-operative Oncology Group (ECOG) PS was converted to KPS by considering KPS of 100 equal to ECOG PS of 0, KPS of 80–90 equal to ECOG PS of 1, and KPS 70 equal to ECOG PS2. Risk groups were formed on the basis of the following six factors: PS, time from nephrectomy, calcium level, haemoglobin level, neutrophil count and platelet count (categorised as shown in Table 3) according to the published model (Heng et al, 2009): favourable risk 0 factors, intermediate risk 1–2, poor risk 3–6. C-index was 0.574 in the development and 0.576 in the validation data sets for the IDC model. This modest discrimination ability was attributed to the worse predicted compared with observed survival in both the development (Figure 4A) and validation (Figure 4B) data sets. When IDC model was recalibrated, a slope of 0.555 (P-value<0.001) for the development and a slope of 0.580 (P-value<0.001) were estimated for the validation cohorts. Predicted survival was much closer to observed after recalibration of the IDC model especially in the validation cohort (Figure 4C and D). An overall evaluation of IDC model with respect to survival in the development and validation cohorts is shown in Table 4 after calibration. It should be noted that, in contrast, to our model, survival of each risk group according to IDC classification was quite similar between the development and the validation cohorts. We also compared the predictive performance, in the development and the validation data sets, of our model with that proposed by IDC (Heng et al, 2009), as well as, with that proposed by the MSKCC (Motzer et al, 2001), using ROC curves (data not shown). No statistically significant differences were seen in the validation data set, whereas our model performed better in the development data set—this was somehow expected, as our model was derived on the basis of the highest predictive ability with respect to survival of subjects in the development data set.

Figure 4
figure 4

Observed (solid lines) and predicted (dashed lines) overall survival by risk classification according to the number of risk factors using the model proposed by mRCC IDC in the development (A, C) or the validation (B, D) data sets before (A, B) and after (C, D) recalibration.

Discussion

An ideal prognostic model should be easy to use, include only the most relevant patient and disease characteristics and accurately distinguish patient groups with different prognosis. Our model fully meets the first two criteria and has satisfactory discriminatory ability, although there is room for improvement.

Other prognostic models for mRCC have been previously proposed by the MSKCC (Motzer et al, 2001), the Cleveland Clinic Foundation (Choueiri et al, 2007), French investigators (Negrier et al, 2005), the International Kidney Cancer Working Group (IKCWG) (Manola et al, 2011) and IDC (Heng et al, 2009). All these models, except from the IDC, are on the basis of outcomes of patients treated with immunotherapy or on single-institution experiences and have not always been externally validated. Our model is simpler, including only three clinical factors, usually readily available for every patient with mRCC. This is confirmed by the fact that this information was available for almost all patients in the validation cohort, although patients were not selected on the basis of the availability of such information. Among the three prognostic factors included in our model, PS and time from diagnosis have been consistently found significant in all relevant studies both in cytokine and targeted therapy era (Motzer et al, 1999, 2001; Negrier et al, 2005; Choueiri et al, 2007; Escudier et al, 2007c; Motzer et al, 2008; Heng et al, 2009; Motzer et al, 2009; Beuselinck et al, 2011; Karakiewicz et al, 2011), while number of metastatic sites has been shown to be an independent prognostic factor in several mRCC series (Negrier et al, 2005; Escudier et al, 2007c; Manola et al, 2011; Poprach et al, 2012). Our patients were homogenously treated with sunitinib, which is one of the most active agents in mRCC (Patard et al, 2011), the follow-up is among the longest reported in studies with targeted therapies (Heng et al, 2009; Beuselinck et al, 2011; Karakiewicz et al, 2011; Heng et al, 2013) and most patients were not included in clinical trials, thus making it applicable in everyday practice. Internal validation showed good discriminatory ability with a C-index of 0.712, similar to that reported for the IDC model (Heng et al, 2009). Two methods for risk stratification were used. No superiority of one over the other was found. We believe that stratification according to the number of risk factors maintains the simplicity of the model and is more easily applicable in a clinical setting. This classification clearly identifies a group of poor prognosis (three risk factors), which does not seem to benefit from sunitinib therapy (median OS, 5.9 months). Such poor-outcome group has not been identified by previous studies and represents an advantage of the proposed model.

External validation yielded a C-index of 0.634, which is lower than that yielded by internal validation. Nevertheless, it is comparable to that reported for other published models, when studied in independent data sets (Heng et al, 2013). The less optimal performance was mainly found in the groups with the inferior prognosis, where the observed survival was better that the predicted by our model, although these differences were ameliorated with calibration. The reasons for this discrepancy are obscure. There were imbalances between the two cohorts in certain baseline characteristics as well as in the distribution across MSKCC and IDC risk groups in favour of the validation cohort. This is not infrequent, and has also been reported in other similar studies (Kang et al, 2012; Poprach et al, 2012; Yi et al, 2012). Considering the imbalanced factors, time from diagnosis and number of metastatic sites have been included into the final model and, therefore, their imbalance has been accounted for. As most patients in the validation cohort had undergone nephrectomy, separate validation studies, including only nephrectomised patients, were performed. This did not significantly improve our results (data not shown). The other three imbalanced factors, that is, previous IFN, non-clear histology and LDH were not further investigated. Previous IFNa and histology were not found to be significant in univariate analysis, while LDH was not available in 27% of our patients, limiting the power of further analyses. Other factors may have also affected our results. It has recently been suggested that eligible-for-studies patients may have different outcomes than non-eligible patients (Heng et al, 2012), while survival in expanded access programs (EAPs) for sunitinib has been lower than that of the randomised study (Gore et al, 2009; Motzer et al, 2009). Most patients from the French centres had been included in clinical trials in contrast to Greek and Belgian patients. Although not all our patients would be ineligible for trials, median survival of our cohort resembled that of the EAP, while median OS of the validation cohort approximated that of the randomized study. Inclusion in clinical studies may affect outcome through more thorough tumour evaluation and follow-up. This may be particularly true for the detection of metastatic sites. For these reasons, we performed additional analyses using only Belgian patients as the validation cohort and also using a model with only PS and time from diagnosis. These analyses did not result in better performance of our model (data not shown) but the relatively small numbers included in these subgroups may limit these analyses.

Among the previously developed models, that proposed by IDC (Heng et al, 2009) is rapidly gaining acceptance, as it is the only one developed with patients treated with targeted therapies, has been externally validated and seems to have higher stratification capability than the others (Heng et al, 2013). We, therefore, attempted to validate this model in our two independent, homogenously treated, non-selected populations. Median OS of the IDC risk groups was fairly similar between development and validation cohorts, which is an improvement over our model and supports its applicability in mRCC patients. Nevertheless, in both cohorts, C-index was below 0.6, lower than the 0.634 of our model and that of 0.664, yielded by the external validation procedure for IDC (Heng et al, 2013). Again, the most notable deviation of the predicted from the observed survival was found in the poor-risk groups, where median OS was higher (13.6 and 11.2 months) than the reported 7.8 months (Heng et al, 2013). The latter could be, at least partially, attributed to the fact that 1/3rd of those patients received sorafenib as first-line treatment, which is considered inferior to sunitinib. Nevertheless, sorafenib-teated patients had similar OS, while there was no available data regarding the treatment of the poor-risk group. The less satisfactory performance of both models in poor-risk patients, a group under-represented in clinical trials with targeted therapies, underlines the necessity for better characterisation of this group through more focused clinical research. In addition, the lower C-indices yielded by external validation for both models, compared with those by internal validation, underline the importance of external validation and the need for confirmation in multiple data sets before the wide acceptance of a proposed prognostic model. There may exist certain, yet unidentified, factors, which might affect outcome in mRCC patients treated with anti-VEGF therapies and might account for the limitations of the existing models. Recent data (Peña et al, 2010; Sun et al, 2011) suggest that the introduction of molecular factors may improve the performance of models relying purely on clinical factors.

In conclusion, we externally validated a simple model, which could be used to stratify patients with mRCC offered sunitinib. Although we believe that it could be used for any type of anti-VEGF therapy, this remains to be confirmed. The predictive accuracy of this model appears comparable to that of the more complex IDC model and could, therefore, represent a valid alternative. Both models did not perform equally well in poor-risk populations, which suggests that further refinement in additional independent data sets may be appropriate.