Main

Upper urinary tract urothelial carcinoma (UUT-UC) is an uncommon disease. Radical nephroureterectomy (NU) is the standard of care for invasive, nonmetastatic UUT-UC. However, UUT-UC is a biologically aggressive malignancy with the potential for disease recurrence and cancer-specific mortality (CSM). The pathological T and N categories and pathologically defined tumour grade are powerful indicators of prognosis. Several other postoperative prognostic risk factors have been identified to help in the clinical decision-making for optimal management.

Improved risk stratification and accurate individualised prediction of postoperative survival outcomes can help guide patient counseling, follow-up scheduling, administration of adjuvant therapies, and design of clinical trials (Cha et al, 2012). Decision aids, such as risk groupings, probability tables, neural networks, and nomograms, have greatly enhanced the ability to predict outcomes. Among the available decision aids, nomograms are the most accurate and discriminatory tools for predicting outcomes in patients with UC (Shariat et al, 2009). In contrast to models that assign prognosis based on risk groups, nomograms provide prognostic information based on a combination of variables that allow for a more individualised prediction of outcome.

To date, several nomograms have been established to predict disease stage at the time of radical NU (Margulis et al, 2010; Favaretto et al, 2012) or perioperative mortality (Jeldres et al, 2010a) and to predict prognosis for patients after radical NU (Jeldres et al, 2010b; Cha et al, 2012; Rouprêt et al, 2013; Yates et al, 2012). All nomograms consider clinical and/or pathological variables, but they differ with regard to the number and type of covariates and end points. As some nomograms use an obsolete historical tumour grading system (Jeldres et al, 2010b; Cha et al, 2012), they are not useable in a daily clinical environment as one of the four variables. Recently, Yates et al (2012) developed a new nomogram based on the French collaborative national database on UUT-UC to predict 3- and 5-year cancer-specific survival (CSS). This nomogram was developed from a large cohort of 397 patients between 1995 and 2010 at 11 major medical centres throughout France. Performance of this model has been subsequently validated on 270 patients at another 10 institutions in France (Yates et al, 2012). The nomogram uses the recommended World Health Organisation (WHO) grading system for UC, and only this is available as online calculator. Online availability allows easy and unrestricted access by physicians at any institution worldwide, and the calculator tool functions to generate results quickly, facilitating its use in patient counseling and tumour board discussions (Hessman et al, 2011).

To our knowledge, validation of Yates nomogram has never been investigated separately for other cohort who underwent radical NU. Despite good performance of the nomogram in a separate data set, concerns related to the generalisability of the nomogram remain. The objective of the current study was to validate the online nomogram using a different cohort.

Materials and methods

Patient selection

This was an institutional review board-approved study before initiation of the study. From October 1991 to October 2010, medical records of patients treated by surgery for UUT-UC at our institution were retrospectively reviewed. We excluded 18 patients treated by segmental resection, 11 patients who underwent previous or concurrent radical cystectomy, 9 patients with distant metastasis before radical NU, 6 patients who received neoadjuvant chemotherapy, 3 patients with non-UC, and 6 patients with incomplete information on patient age, pathological tumour and nodal stage, lymphovascular invasion, concomitant carcinoma in situ, neoadjuvant or adjuvant chemotherapy, and radiotherapy status. The study comprised the remaining 328 patients with UUT-UC who underwent open (n=251) or laparoscopic (n=77) radical NU. The demographic data for nomogram development cohort in comparison to external validation cohort is shown in Table 1. The hilar and regional lymph nodes adjacent to the ipsilateral great vessel were generally resected if palpable intraoperatively or enlarged on preoperative axial imaging. The extent of lymphadenectomy performed was at the discretion of the individual surgeons.

Table 1 Descriptive statistics of development cohort and external validation cohort

Pathology review

All surgical specimens were examined according to the standard pathological procedure at our institution. Pathological specimens were processed and evaluated according to the standard pathological procedures by staff surgical pathologists at our institution. Pathological stage was assigned to the 2002 TNM classification by the American Joint Committee on Cancer-Union for International Cancer Control (AJCC/UICC) (Greene et al, 2002) and tumour grading was assessed according to both the recommended 1973 WHO system and the 1998/2004 International Society of Urological Pathology/WHO consensus classification (Mostofi et al, 1973; Epstein et al, 1998).

Follow-up

Patients were generally followed up after radical NU at least every 3–4 months for the first year and then were followed up semi-annually from the second through the fifth year. After 5 years, patients were seen annually. Follow-up consisted of physical examination, serum chemistry evaluation, urinary cytology, cystoscopic evaluation of the urinary bladder, chest X-rays, renal ultrasound, and/or computed tomography scan. CSM was defined as time from radical NU to death from UUT-UC. Cause of death was determined by chart review, the Korea National Statistical Office, and/or confirmed by interview with the family of the patient. Perioperative mortality was censored at the time of death for CSM analyses.

Online nomogram

An online version of risk calculator developed by Yates et al (2012) is available at http://labs.fccc.edu/nomograms/nomogram.php?id=66&audience=1. The nomogram requires tumour location (coded as renal pelvis=1, ureter=2, and ureter and renal pelvis=3), tumour grade (coded as 1=1, 2=2, and 3=3), age (incorporated as continuous variables from 20 to 100), pT stage (coded as pTa=1, pT1/CIS=2, pT2=3, and pT3/4=4), and pN stage (coded as pN0/x=1 and pN+=2). Each patient was tested with the ‘on-line tool’. After data were entered, the software calculated the probability of CSM at 3 and 5 years after radical NU.

Statistical analysis

Multivariate survival analysis with calculation of hazard ratio (HR) with 95% confidence interval (CI) was performed using the Cox proportional hazards model including all covariates that were significant on univariate analysis. Competing risk regression analysis was also performed by calculating the cumulative incidence function.

The predicted risk of CSM was compared with the actual CSM in the current study population at 3 and 5 years. The discrimination of models was quantified using Harrell’s concordance index (Harrell et al, 1996), which is similar to the area under the curve for receiver operating characteristics curve but allows calculation in continuous and censored data (such as time to event data). Harrell’s concordance index is a measure of how well a tool can determine the relative risk of individual patients in the population sample. A value of 1.0 indicates perfect predictions, whereas 0.5 is equivalent to a toss of a coin.

Calibration reflects how well the tool predicts an absolute outcome, such as the likelihood of survival at 3 and 5 years. The relationship between the model-derived and actuarial CSM was graphically explored within calibration plots to explore nomogram performance. External validation was done using 200 bootstrap resamples to decrease overfit bias. Calibration was also assessed using the quartiles of the predicted survival at 3 and 5 years and calculation of the corresponding observed Kaplan–Meier estimates. To determine the net benefit derived by examining the theoretical relation between the threshold probability of developing an event and the relative value of false-positive and false-negative results, we relied on decision curve analysis (Vickers and Elkin, 2006).

For all the statistical tests, two-sided P<0.05 was considered statistically significant. Statistical calculations were performed using SPSS v.18.0 (SPSS, Chicago, IL, USA) and R package, v.2.6.1 (http://www.R-project.org).

Results

Age, tumour location, bladder cuff removal, pathological T category, pathological N category, tumour grade, and positive surgical margin were associated with CSS on univariate analysis while sex and associated CIS were not. On multivariate analysis, age (c index=0.569), tumour location (c index=0.572), bladder cuff removal (c index=0.603), pathological T category (c index=0.727), and positive surgical margin (c index=0.546) retained independent association with CSS (Table 2).

Table 2 Univariate and multivariate Cox proportional hazard regression analysis of cancer-specific survival

Figure 1 shows the CSS adjusted for the competing risk of death from other causes. The risk of dying from UUT-UC outweighed the risk of dying from other causes.

Figure 1
figure 1

Predicted probability of cause-specific cumulative incidence of death from other causes and from upper urinary tract urothelial carcinoma.

The discrimination accuracy of the nomograms at 3 and 5 years was 74.2% and 74.2%, respectively. To assess the agreement between the predicted and actual outcomes, we generated calibration curves. The dashed line represents the performance of an ideal nomogram, where the predicted outcome would correspond perfectly with the actual outcome. The performance of nomogram is plotted as the solid line. The solid line is not close to the dashed line in the ideal nomogram. The calibration plots tend to exaggerate the overestimation of mortality between predicted and observed probabilities at 3 and 5 years for survival (Figure 2).

Figure 2
figure 2

Calibration plots depicting the correlation between predicted and actual cancer-specific mortality. (A) Nomogram-predicted probability of 3-year cancer-specific survival. (B) Nomogram-predicted probability of 5-year cancer-specific survival. The line containing error bars (95% CI) represents the performance of the models applied to the observed fractions surviving.

Figure 3 illustrates the Kaplan–Meier curves for patients stratified into four groups from the nomograms. Patients were clustered into four groups according to their nomogram-predicted survival (first quartile, <25%; second quartile, 25–50%; third quartile, 50–75%; and fourth quartile, >75%). As depicted, models discriminated well, and log-rank tests were all highly significant.

Figure 3
figure 3

Patient survival by quartiles of nomograms predictions for ( A ) 3-year cancer-specific survival ( P <0.001) and ( B ) 5-year cancer-specific survival ( P <0.001) (First quartile <25%; Second quartile, 25–50%; Third quartile, 50–75%; Fourth quartile, >75%).

Figure 4 presents the results of the decision curve analysis at 3 and 5 years. Decision curve analysis revealed that the use of nomograms was associated with net benefit gains relative to the treat-all strategy. When compared with AJCC/UICC staging system, the nomograms performed well across a wide range of threshold probabilities using the decision curve analysis.

Figure 4
figure 4

Decision curve analysis for 3-year and 5-year cancer-specific mortality. (A) Nomogram-predicted probability of 3-year cancer-specific survival. (B) Nomogram-predicted probability of 5-year cancer-specific survival. In decision curve analysis, the y axis measures net benefit, calculated by summing the benefits (true positives) and subtracting the harms (false positives). The straight line represents the assumption that all patients will die, and the horizontal line represents the assumption that no patients will die. The dotted line indicates the net benefit of using the new model. Model 1 refers to the AJCC/UICC staging system. Model 2 refers to the nomograms.

The advantage of the combined prognostic factors expressed through the nomogram over the AJCC/UICC staging system is shown in Figure 5. The heterogeneity was found particularly in higher stages.

Figure 5
figure 5

Distribution of nomogram predictions within each AJCC/UICC stage grouping. (A) Nomogram-predicted probability of 3-year cancer-specific survival. (B) Nomogram-predicted probability of 5-year cancer-specific survival.

Discussion

The outcomes of patients with UUT-UC after radical NU are heterogeneous and difficult to predict. Predictive models may help clinicians to plan treatments and to better inform patients about the options and potential consequences of therapies. The clinician’s ability to predict a given clinical outcome is less accurate than the mathematically-based models. Given the relative rarity of UUT-UC, data regarding clinicopathological predictors of outcomes are sparse.

Although risk grouping is a logical approach, grouping patients is an inefficient use of the data and tends to reduce the predictive accuracy of a prognostic model (spectrum bias; Shariat et al, 2008a). The misconception related to this approach is that it assumes that all patients within a risk group are equal. In addition, risk grouping requires the conversion of continuous to categorical variables, which limits information about the actual value. In contrast to risk groups, a nomogram provides an individualised estimate of the predicted probability of the event of interest, which is entirely based on the individual’s disease characteristics, without averaging or combining within a category (Shariat et al, 2008a). The principle advantage of nomogram is that it provides a survival probability for individual cases.

Some nomograms addressing the outcomes for patients who received radical NU have been introduced (Jeldres et al, 2010b; Cha et al, 2012; Rouprêt et al, 2013; Yates et al, 2012). These tools exceed AJCC/TNM staging for prognosis of survival in internal validation. However, further external validation of the current nomograms in other independent datasets, preferably from different countries or races, should be performed, as differences in population characteristics and treatment may undermine the accuracy and calibration of nomograms. Regarding prediction of CSS with the online nomogram developed by Yates et al (2012), predictive accuracy has been previously examined; in internal validation, the concordance index c was 0.78.

The aim of the present study was to externally validate the previously developed nomogram in the prediction of CSM in a different patient cohort who had undergone radical NU for UUT-UC. To our knowledge, this is the first study to validate the prognostic models for UUT-UC based on a population other than Caucasian. We have observed a good prediction of clinical outcome by the nomogram. The accuracies of 3- and 5-year nomograms were both 74.2%. Although the Kaplan–Meier curve is within the boundaries of 3- and 5-year predictions for each of the four strata, the correspondence seen between the actual and ideal nomogram predictions by the calibration plot does not suggest good calibration of the nomogram in the validation cohort. The predictive model may be impaired due to the impact of an uneven distribution of unknown risk modifying characteristics between the derivation and validation cohort. Therefore, heterogeneity in patient populations (e.g., ethnicity, genetic background, and specific distribution of risk factors) or differences in hospital- and physician-specific treatment strategies and follow-up protocols can lead to poor calibration in comparison with the derivation cohort (Nuhn et al, 2012). Using DCA, the nomogram demonstrated better net benefit gains relative to the AJCC/TNM staging system for predictions of the examined end points at 3 and 5 years after radical NU. As depicted in Figure 4, nomogram predictions differ a lot within stages, especially higher stages. As nomogram discrimination was significantly superior to that of UICC/AJCC stage grouping, it may be suggested that survival, especially for patients with high-stage tumours, is influenced by factors that are not included in the UICC/AJCC classification.

Undoubtedly, further improvements in predictive models can also be attained by incorporating established risk factors. For example, the Yates nomogram is only for use in patients with urothelial carcinoma and not suitable for other subtypes like squamous cell carcinoma. Also, this nomogram do not include risk factors that have shown to correlate significantly with the clinical outcome of patients, including bladder tumour history (Milojevic et al, 2012), preoperative hydronephrosis grade (Ito et al, 2011), tumour multifocality (Chromecki et al, 2012), lymphovascular invasion (Akao et al, 2008; Lin et al, 2008; Godfrey et al, 2012), tumour architecture (Fritsche et al, 2012), and tumour necrosis (Lee et al, 2007; Zigeuner et al, 2010). Cha et al (2012) and Rouprêt et al (2013) developed the prognostic models and confirmed the importance of lymphovascular invasion and tumour architecture, two strong prognostic variables that have been associated with features of biologically and clinically aggressive UUT-UC.

Combinations of molecular markers improve the prediction of recurrence and survival in patients with advanced urothelial carcinoma of the bladder (pT3-4 or N+) (Shariat et al, 2012) as well as organ-confined disease (pT1-2N0M0) (Zaak et al, 2010). Therefore, novel biomarkers should be implemented into future prediction models to provide more individualised risk estimations. The ability of biomarker data has been confirmed to improve nomogram predictions (Shariat et al, 2008b). However, the limited availability of such molecular markers might restrict the widespread application of such models. In addition, as racial differences may also exist in the expression of molecular markers, validation of these models should be performed in the following research, including different races.

Several limitations of this study merit discussion. First, the retrospective design of this study includes patient data that were retrieved over two decades. We excluded six patients without complete clinical information or whose pathologies were not available, thus introducing a possible selection bias. Second, nomogram prediction is not 100% accurate. This limitation is shared with all the other prediction tools. Differences between the two patient cohorts exist, which may be considered as potential limitations. For example, the validation cohort was more likely to have higher pathological T category and lower tumour grade than the nomogram-development cohort. This fact may influence our results. In addition, the duration of follow-up in the validation cohort was longer than that in the nomogram-development population. Late events (or the absence of late events) occurring when patients were censored would be capable of influencing the survival rates in such a population overproportionally. This may also explain the reason why the predictions calculated using the nomogram are linked with a better prognosis than the actual outcome. Third, it is worrisome that 18.6% of patients did not undergo a bladder cuff excision, which represents a standard of care in patients with a radical NU. Lack of removal of the distal ureter and ureteral orifice may predispose a patient to local recurrences. Finally, lack of lymph node dissection standardisation represents another concern. The extent of lymph node dissection is not standardised in UUT-UC. In the present study, 83.2% of patients were not staged with a lymph node dissection. As it may underestimate the proportion of patients with lymph node metastases, this could explain the overestimation of the survival in the validation cohort.

Conclusions

Risk stratification based on current AJCC/TNM staging suffers from significant heterogeneity within each staging group. The online nomogram developed based on the French collaborative national database on UUT-UC is a highly accurate prognostic tool for patients with UUT-UC treated with radical NU. The model can provide an accurate estimate of the individual risk of CSM and outperformed the use of the AJCC/TNM-based predictions with regard to the net benefit. These results encourage the use of this model derived from a European patient cohort for predicting clinical outcome of non-European patients. Nevertheless, further improvements and implementations of novel molecular markers are needed.