Main

The prediction of recurrence after primary surgical treatment in women with cancer is a cornerstone of patient management. In particular, predicting individualised outcome is of major importance for physicians to decide on the treatment options, which follow-up strategies to adopt and how to best counsel the patient. In the field of endometrial cancer (EC), a major cause of morbidity and mortality for women worldwide, most women are diagnosed at an early stage with uterine-confined tumours (Ferlay et al, 2010). However, despite the overall favourable prognosis, some women have aggressive tumours with a substantial risk of recurrence and death (Creasman et al, 1987). The past decade has been marked by several important advances in therapeutic options, such as less morbid surgical approaches and the routine use of chemotherapy and radiotherapy (Karolewski et al, 2006; Colombo et al, 2011). More recently, as for most cancer types, a complementary approach based on prediction models has been developed (Kattan, 2008; Abu-Rustum et al, 2010; Koskas et al, 2011; Bendifallah et al, 2012). Cancer researchers, clinicians and patients are increasingly interested in nomograms, which are defined as a graphical representation of a statistical model to predict a particular end point according to the individual characteristics of the patient (Chun et al, 2006; Bendifallah et al, 2012; Isariyawongse and Kattan, 2012). By providing predictions that are both evidence-based and individualised, these tools can improve medical management and guide the decision-making process (Chun et al, 2006). Kondalsamy-Chennakesavan et al (2012) presented two postoperative nomograms in women with surgically treated stage I–III EC that predict the probability of isolated loco-regional recurrence (ILRR) and distant recurrence (DR) at 3 years. These nomograms are built on clinicopathological parameters and were internally validated using cross-validation and bootstrapping methods. The nomograms included the following covariates: age at diagnosis, FIGO stage (2009), grade, lymphovascular invasion, histological type, depth of myometrial invasion and peritoneal cytology. However, external validation in an independent set of women is required to ensure applicability to patients from different institutions (Awtrey, 2012). The aim of this study was to externally validate these recently introduced nomograms in a population of women with EC using large databases of academic cancer centres.

Materials and methods

Patients

In the current study, data of all women with EC who received primary surgical treatment between January 2003 and December 2009 were abstracted from two institutions with prospectively maintained EC databases (Tenon University Hospital and Reims University Hospital, both in France) and from the Senti-Endo trial (Ballester et al, 2011) with the same inclusion criteria as those of the Kondalsamy-Chennakesawan et al (2012) cohort. Electronic medical records and surgery notes were also reviewed. To be included for validation analysis, the women had to have all nomogram variables documented. They were treated with upfront surgery according to the international guidelines. Board-approved gynaecological pathologists assessed the pathological specimens. Histological staging and grading were performed according to the 2009 FIGO classification (Petru et al, 2009) system on the basis of the final evaluation of the pathological specimen. Adjuvant therapy was administered as recommended by multidisciplinary committees in accordance with the French guidelines (Querleu et al, 2011). All women were followed in the institutions’ outpatient department. Recurrent disease was diagnosed by biopsy or imaging studies. Any woman not presenting for scheduled follow-up visits was contacted. The study was approved by the Institutional Review Boards of both the centres.

The Kondalsamy-Chennakesavan nomograms

Patients who underwent primary surgery between 1997 and 2007 were included in their study. Sixteen covariates were evaluated for their prognostic significance and modelled using multivariable competing risks regression to predict 3-year outcomes as part of a nomogram.

Three competing events were recorded as a first event: (i) ILRR, (ii) DR or (iii) death from other causes without recurrence. Isolated loco-regional recurrence was defined as an isolated recurrence on the vaginal vault or within the true pelvis and DR as recurrence outside the true pelvis irrespective of their LRR status. The nomograms included the following covariates: age at diagnosis, FIGO stage (2009), grade, lymphovascular invasion, histological type, depth of myometrial invasion and peritoneal cytology (Kondalsamy-Chennakesavan et al, 2012).

Validation

The discrimination (Hanley and McNeil, 1982) and calibration accuracies of both nomograms were assessed. Discrimination is the ability to differentiate between women with recurrence and those without. It is measured using the receiver operating characteristic curve and summarised by the area under the curve (AUC). An AUC of 1.0 indicates perfect concordance, whereas an AUC of 0.5 indicates no relationship. Calibration is the agreement between the frequency of observed outcome and the predicted probabilities, and was studied using graphical representations of the relationship between the two calibration curves. In addition, women were clustered into deciles according to their nomogram score. For each decile group, we calculated the difference between the predicted and the observed ILRR or DR probability. A subgroup analysis was performed according to the European Society for Medical Oncology (ESMO) risk stratifications (Colombo et al, 2011).

Other statistical tests

The categorical variables were analysed using the χ2 test. Differences were considered significant at a level of P<0.05. All analyses were performed using the R software with the rms, Hmisc and Presence Absence packages (http://lib.stat.cmu.edu/R/CRAN).

Results

During the study period, 380 women with EC were documented as having received primary surgical treatment. Among them, 271 had all nomogram variables documented and were selected for validation analysis according to the following distribution: Tenon University Hospital (n=77; 28%), Reims University Hospital (n=97; 36%) and Senti-Endo trial (n=97; 36%). The demographics and clinicopathologic characteristics of both the Kondalsamy-Chennakesavan cohort and our validation cohort are outlined in Table 1. The median follow-up and initial recurrence time were 38.1 (range: 22–69) and 22.0 (range: 8.3–55) months, respectively. At the time of the last follow-up, the overall recurrence rate, ILRR rate and DR rate were 13.8% (37 out of 271), 1.8% (5 out of 271) and 11.8% (32 out of 271), respectively. Both cohorts were mainly composed of early-stage EC. There was a significantly higher rate of women with grade 3 and papillary serous/clear cell histological subtypes in the validation cohort and women in this cohort were slightly younger at the time of surgery. Additional differences include a higher rate of lymph node dissection and adjuvant treatment assignment in the validation cohort (55.7% vs 25.8% and 86% vs 35.8%, respectively).

Table 1 Patient characteristics of the Kondalsamy-Chennakesavan (N=2097) and the validation (N=271) cohorts

Validation

AUCs were 0.69 (95% CI, 0.58–0.79) for the ILRR nomogram and 0.66 (95% CI, 0.60–0.71) for the DR nomogram for the whole population (Figure 1). The predicted and the actual probabilities of 3-year ILRR and DR are shown in the calibration plot (Figures 2A and B). The performance of both nomograms appears to be accurate, with a mean error of 5.4% and 4%, respectively, in the whole population according to the decile of risk stratification (Table 2).

Figure 1
figure 1

Receiver operating characteristic curves corresponding to the loco-regional (solid line) and distant recurrence nomograms (dotted line).

Figure 2
figure 2

Calibration plot of the Kondalsamy-Chennakesavan et al nomograms for the entire cohort of 271 patients. E, difference in predicted and calibrated probabilities between calibration and area under the receiver operating characteristic curve; Emax, maximal error; Eaver, average error. (x-axis, predicted probability using the nomogram; y-axis, actual incidence. The dashed diagonal line represents an ideal nomogram, where predicted outcome perfectly matches with actual outcome; the solid line represents the nomograms calibration.)

Table 2 Comparison between predicted and observed lymph node metastasis probability for both the Kondalsamy-Chennakesavan and validation population

However, the performance appears to be heterogeneous according to the ESMO risk stratifications. This subgroup stratification underlines acceptable discrimination ability with poor calibration accuracy (Table 3).

Table 3 Nomogram performance according to the ESMO risk stratification subgroups

Discussion

Recent advances in therapeutic approaches for EC, coupled with the new 2009 FIGO staging system, have led to an increasing interest in individual prediction and risk calculation (Chun et al, 2006; Abu-Rustum et al, 2010; Bendifallah et al, 2012; Isariyawongse and Kattan, 2012). Today, several aspects of prediction, such as the likelihood of recurrence or survival (Abu-Rustum et al, 2010) or lymph node metastases (Bendifallah et al, 2012), can be studied by means of a nomogram. Improving the healthcare management of women with EC by maximising oncologic survival and minimising morbidity has thus become a realistic challenge. The Kondalsamy-Chennakesawan nomograms were externally validated and shown to be partly generalisable to an independent patient population. The predictive accuracy according to discrimination was 0.69 and 0.66 for the 3-year ILRR and DR nomograms, respectively. The correspondence between observed recurrence rate and the nomogram predictions suggests a moderate calibration in the validation cohort.

Recurrence has been reported to occur in all stages of initial disease in EC, and is uniformly associated with poor survival (Sears et al, 1994; Morrow et al, 1999; Fujimoto et al, 2009). Most events are diagnosed within the first 2 years after surgery (Sears et al, 1994; Fujimoto et al, 2009). The overall recurrence ratio of 13.8% observed in the current study after an initial recurrence time that occurred with a median of 22 months is consistent with many other studies on stage I–III EC (recurrence ratio: 20–25%) (Mariani et al, 2002). Multiple high-risk (HR) factors of recurrence have been identified in apparent early-stage disease (Colombo et al, 2011). Current recurrence models such as the ESMO risk stratification (Colombo et al, 2011) or the Gynecologic Oncology Group (Morrow et al, 1999) system incorporate these factors to provide risk group stratification. Risk grouping by definition distinguishes between low-risk and HR women and is more reliable than risk estimation based on the physician’s judgement. However, their predictive capability is based on the assumption that all women within a given risk group are equal. In practice, heterogeneity of tumour biology and of women’s characteristics within each pathologic subgroup has been observed (Fujimoto et al, 2009). The resulting predictive value of such risk groups may be less accurate and thus less useful for counselling purposes. The two postoperative nomograms were based on the most common evidence-based HR factors and constitute a valuable contribution for improving the healthcare for women with EC. By combining evidence-based HR factors, nomograms offer the advantage of condensing the high heterogeneity of the disease into a simple and easily interpretable format.

The observed discrepancy between the original and the current study could be explained by the significant clinical, pathological, surgical and adjuvant treatment differences between the two populations (Table 1). Indeed, the authors state that 25.8% of patients had a lymphadenectomy as part of their surgical staging, and that 35.8% of patients underwent adjuvant therapy (mostly external beam radiotherapy). In our series more than 80% of patients received adjuvant therapy. Furthermore, more than 55.7% had a lymphadenectomy, which is consistent with the ESMO guidelines (Colombo et al, 2011) even though the role of systematic pelvic lymphadenectomy in early-stage EC is currently under debate (Benedetti Panici et al, 2008; Kitchener et al, 2009). Theseresults underline the potentially heterogeneous surgical and adjuvant management of EC over the last 10 years in different countries. Secondly, it may be hypothesised that the low local recurrence rate observed in our study could be due to the complete systematic lymphadenectomy and adjuvant therapy, performed to reduce recurrence rates. Such differences may in turn affect the applicability of the nomogram for our patients and in fine its generalisability. It could be reasonably argued that a potentially heterogeneous predictive model built to predict recurrence events on a data set formed from 15 years ago and derived from an Australian population might not be able to accurately predict the end point in a French population if the nomograms do not take into account adjuvant treatment and lymph node status. Moreover, there is some concern about the way in which the parameters were assigned. The prognostic weight of a stage IIIC and positive lymphovascular space invasion (LVSI) patient is paradoxically discordant with the reported literature (Colombo et al, 2011). Indeed, LVSI and FIGO stage IIIC are major independent risk factors for poor outcome in EC (Colombo et al, 2011). Furthermore, the Kondalsamy-Chennakesawan-reported LVSI rate was not consistent with many other studies on stage I–III EC. It is thus difficult to understand what the results of individualised risk calculations for ILRR and DR rates mean in this setting.

In addition to discrimination, we used calibration measurements (Figures 2A and B) to provide better information as to the true accuracy of the models. Predictions for both ILRR and DR were partly well calibrated but the predicted percentages were unsatisfactory when both low-intermediate and HR women were studied (Table 3). Heterogeneity in women populations (e.g. ethnicity, genetic background and specific distribution of risk factors) or differences in hospital- and physician-specific treatment strategies and follow-up protocols can lead to poor calibration in comparison with the derivation cohort. Therefore, before introducing these predictive tools into daily practice, we believe that they need to be improved by including the lymph node status and the adjuvant therapy information.

Some limitations of the present study have to be underlined. First, the retrospective nature of the study cannot exclude bias. Second, during the period of data collection, modifications in staging modalities and surgical techniques (e.g. pelvic lymph node dissection) occurred. Moreover, there was a relatively small number of patients in the current study, especially when compared with the original model development, which included 2097 patients.

In practice, the real question is to know if that model is partially generalisable because of the heterogeneous population used to validate it or because the original model is inherently unstable.

Conclusion

The challenges we face in improving the healthcare of women with EC could be met by the nomogram approach. However, the current nomograms need to take into account information about adjuvant treatment and lymph node status before being used to identify eligible women for clinical trials or guiding the physician in decisions about post-treatment follow-up. Other external validations based on populations from the United States and Europe would seem to be essential to complete this work.