Introduction

The staging algorithm for soft tissue sarcomas of the retroperitoneum proposed in the 8th edition of the American Joint Committee on Cancer (AJCC) Cancer Staging Manual substratifies patient risk primarily based on tumor size. This is unfortunate, since histologic grade is the most important prognostic factor in predicting patient survival [1,2,3]. Other anatomic and pathologic factors such as tumor involvement of contiguous organs [1, 4], tumor multifocality [5, 6], and histologic subtype of sarcoma [3,4,5, 7, 8] also appear to contribute important predictive information for patients with retroperitoneal sarcomas.

A recently proposed alternative staging system (the Vanderbilt staging system) for retroperitoneal sarcoma incorporates information regarding the histologic subtype of sarcoma and outperforms the 7th and 8th editions of the AJCC staging systems by several different statistical criteria [1]. A legitimate criticism of this prior study is that the comparative analyses were performed on the same study cohort used to develop the staging algorithm, thereby biasing the results. Therefore, the current study was performed using an independent patient cohort to validate the Vanderbilt Staging System for soft tissue sarcomas of the retroperitoneum.

Patients and methods

The National Cancer Database cohort of patients with retroperitoneal sarcoma

The National Cancer Database, established in 1989 as a joint project of the Commission on Cancer of the American College of Surgeons and the American Cancer Society, is a nationwide, comprehensive, clinical surveillance resource oncology dataset representing hospital-based registry data from over 1500 Commission on Cancer-accredited facilities (capturing >70% of newly diagnosed malignancies in the US) [9, 10]. The data used in this study are derived from a deidentified National Cancer Database file. The American College of Surgeons has executed a Business Associate Agreement that includes a data use agreement with each of its Commission on Cancer-accredited hospitals. The Commission on Cancer’s National Cancer Database and participating hospitals are the source of the deidentified data used herein; they have not verified and are not responsible for the statistical validity of the data analysis or the conclusions derived by the authors. Neither has the American College of Surgeons verified the analytic or statistical methodology employed; they are also not responsible for the conclusions drawn from these data by the investigators. Study materials may be obtained from the American College of Surgeons’ Commission on Cancer only by investigators at Commission on Cancer-accredited facilities. The Institutional Research Board at Vanderbilt University Medical Center approved the study protocol and provided a waiver of informed consent.

The National Cancer Database (Participant User File 2017) was queried for all subtypes of retroperitoneal sarcoma. Of the 10,396 cases in the database, 2151 were excluded either because surgical resection was not performed or could not be confirmed. Vital status information is not provided for patients diagnosed after 2014 because of the limited follow up available for these patients; these cases were excluded (n = 742). Also excluded were cases with histopathologic diagnoses not recommended for staging by AJCC criteria (n = 313) and cases lacking diagnostic confirmation by histopathologic examination (n = 155). The study was limited adults, so 115 patients <18 years of age were also excluded. Finally, cases with extreme recorded values for tumor size (>60 cm or <1 cm; n = 63) were dropped as outliers. The final cohort consisted of 6857 cases diagnosed between 2004 and 2014.

Statistical analyses

To assess the degree of discrimination between tumor stage categories, Kaplan–Meier overall survival curves were plotted and pairwise comparisons of adjacent stage categories were performed using Sidak’s method to account for multiple comparisons. Comparing areas under receiver operating characteristic curves generated from logistic regression of 5-year overall survival was used to evaluate the predictive accuracy of each staging system. Three different concordance indices (Harrell’s c, Somers’ D, and Gönen and Heller’s K) were calculated using bootstrap methods (500 replications) on separate training and validation sets semi-randomly created after sorting on each staging system to ensure relatively similar percentages of each tumor stage in the bootstrapped samples [11,12,13,14].

Harrell’s c reports the probability that the hazard ratio for a patient with a longer survival time is less than that of a patient with a shorter survival time, such that a non-informative model results in c = 0.5 and a fully informative model results in c = 1. Somers’ D quantifies the difference in probabilities that longer surviving patients have smaller hazard ratios compared to patients with shorter survival intervals; therefore, its possible range is (−1, 1) and D = 0 in a non-informative model. Since Harrell’s c and Somers’ D ignore censored outcomes occurring before events, this potential source of bias was minimized by comparing differences of the estimated indices only between nested regression models excluding 2013 patients censored before 5 years of clinical follow-up. Gönen and Heller’s K calculates the probability that a patient with a higher hazard ratio fails earlier than one with a lower hazard ratio based on model parameters and observed distribution of covariates and is independent of censoring; therefore, all study cases were used to compute Gönen and Heller’s K by separate bootstrapping methods.

The amount of variation in observed outcomes (survival intervals) explained by the regression model parameters was assessed using O’Quigley’s ρ2k and Royston’s R2 [15, 16]. Standard errors of these estimates were bootstrapped with 500 replications. Bayesian information criteria (based on the number of deaths, and not simply overall sample size) were also assessed as measures of model fit [17,18,19,20]. All results are from two-sided hypothesis tests using α = 0.05. All analyses were performed using Stata version 15 (StataCorp LLC, College Station, TX, USA).

Results

The National Cancer Database Cohort of retroperitoneal sarcomas

Clinicopathologic characteristics of the entire cohort are provided in Table 1. Mean age was 61 years (SD 13 years; median 62 years; IQR 53–71) and mean tumor size was 17.8 cm (SD 11.0 cm; median 15.5 cm; IQR 9.5–24.0 cm). Median follow-up of censored patients was 53 months (range <1–153 months). Of the 6857 patients, 3,076 (45%) died a median of 23 months after surgical resection (range, 1 day–144 months).

Table 1 National Cancer Database cohort of retroperitoneal sarcomas (n = 6857)

The Vanderbilt Staging System for retroperitoneal sarcoma

Derivation of the Vanderbilt Staging System for retroperitoneal sarcoma has been previously described [1]. Briefly, tumor size is categorized as T1 (≤10 cm), T2 (>10 cm and ≤15 cm), or T3 (>15 cm). Histologic subtype of sarcoma is classified as favorable (well-differentiated liposarcoma; dedifferentiated liposarcoma; liposarcoma, not otherwise specified; myxofibrosarcoma; fibrosarcoma) or unfavorable (all other histologic subtypes). Stage groupings are separated based on these two factors as well as histologic grade and the presence of distant metastasis (Table 2). The proposed staging factors appear appropriate in this validation data set, as demonstrated in plots of predicted hazard ratios for AJCC 8th edition T categories by histologic grade and histologic group, adjusted for patient age, AJCC pM status, history of radiation therapy, and year of diagnosis, as well as interaction terms between AJCC T category, histologic grade, and histologic group (Fig. 1). Although there seems to be evidence of interaction between tumor size and grade and between tumor size and unfavorable histology, none of these interaction terms were statistically significant in separately analyzed regression models (data not shown). Also noteworthy is that AJCC T categories are not highly effective in categorizing risk of death, with minimal contrast for categories T1–T3 (also illustrated in Supplementary Figures S1 and S2).

Table 2 The Vanderbilt staging system for retroperitoneal sarcoma
Fig. 1
figure 1

Predicted hazard ratios for AJCC 8th edition T categories adjusted for patient age, AJCC pM status, history of radiation therapy, and year of diagnosis as well as interactions with histologic grade and histologic group (AJCC, American Joint Committee on Cancer)

Comparison of the Vanderbilt and American Joint Committee on Cancer 7th and 8th edition staging systems for retroperitoneal sarcoma

Kaplan–Meier plots for overall survival reveal minimal separation between stages IA and IB and between stages II and IIIA the AJCC 8th edition system (Fig. 2). The AJCC 7th edition system shows poor separation between stages IA and IB and between IIA and IIB. These findings are not surprising, since these stage differences are exclusively based on dichotomization of tumor size. In contrast, the Vanderbilt staging system shows good discrimination for all stages except for Stages IA and IB. Pairwise comparison of hazard ratio coefficients for adjacent tumor stages in each staging system confirm these impressions (Table 3).

Fig. 2
figure 2

Kaplan–Meier plots of overall survival by AJCC 7th and 8th edition and Vanderbilt staging algorithms for retroperitoneal sarcoma (AJCC, American Joint Committee on Cancer)

Table 3 Pairwise comparisons of adjacent tumor stages for American Joint Committee on Cancer and Vanderbilt staging systems for retroperitoneal sarcoma

The capacity of each staging system to predict 5-year overall survival by logistic regression was assessed using the subset of cases with complete information for each staging system (n = 6140). Cases censored before 5 years of clinical follow-up (n = 2013) were also excluded. In this subset (n = 4127), 2309 (56%) patients died within 5 years of surgical resection. Comparison of the areas under receiver operating characteristic curves showed that the Vanderbilt system was more accurate (70 ± 0.8%) than the AJCC 8th edition staging system (67 ± 0.8%; P < 0.00005) and the AJCC 7th edition staging system (68 ± 0.8%; P = 0.01) (Fig. 3). As noted previously, the predictive accuracy of the new AJCC 8th edition was also lower than the prior 7th edition (P < 0.01).

Fig. 3
figure 3

Receiver–operator characteristic curves for 5-year overall survival by AJCC and Vanderbilt staging systems. A model consisting of only distant metastasis is included for comparison. Calculated predictive accuracy (with standard error) for each staging system is provided in parentheses within the figure legend (AJCC, American Joint Committee on Cancer)

Concordance analysis

The Vanderbilt staging system showed significantly greater concordance with clinical outcomes than the AJCC 8th edition staging system for each index calculated (Table 4). Concordance with the Vanderbilt staging system was also higher than the AJCC 7th edition staging system for two of three indices. The Vanderbilt system also accounted for a greater proportion of explained variation than the AJCC 8th edition staging system. Assessment of Bayesian information criteria confirmed that the Vanderbilt staging system generated the best fitting model. Furthermore, the previous 7th edition of the AJCC staging system outperformed the new 8th edition for several of these measures, demonstrating that the revised staging algorithm is inferior to the previous one.

Table 4 Concordance indices, measures of explained variation, and Bayesian information criterion for American Joint Committee on Cancer 7th and 8th edition staging systems compared to the Vanderbilt staging system for retroperitoneal sarcoma

Discussion

The 8th edition of the AJCC Cancer Staging Manual separates staging algorithms for soft tissue sarcoma by anatomic sites: extremities and trunk, retroperitoneum, and head/neck [21]. Although this would intuitively seem to be a major improvement in the staging of sarcomas, at least two of these new staging systems fail to improve on the prior non-site specific edition; [1, 22, 23] preliminary evidence suggests that the revised T categories for head/neck sarcomas do actually improve prognostication for this anatomic site (Cates, JM, unpublished observations, 2017). I (JMMC) recently proposed a revised staging system (the Vanderbilt system) for retroperitoneal sarcomas using data extracted from the Surveillance, Epidemiology, and End Results Program database [1]. However, in this prior study, performance evaluations were executed on the same dataset, arguably biasing the results. Here, the Vanderbilt staging system is validated using an independent dataset (the National Cancer Database).

The Vanderbilt system outperforms the AJCC 8th edition staging system primarily because it relies less on categorization of tumor size—an inherently continuous variable without discrete increases in risk for incremental increases in size—and more on histologic grade and histologic subtype of sarcoma to stratify patients according to risk of death. It is well known that histologic grade is one of the predominant factors that predicts the outcome of patients with retroperitoneal sarcoma [1,2,3]. Moreover, different histologic subtypes of retroperitoneal sarcoma show inherently different clinical behavior unaccounted for in the AJCC staging algorithm [3,4,5, 7, 8].

Other anatomic factors not represented in any staging system currently in widespread use may also be predictive of patient outcomes, such as tumor involvement of contiguous organs or multifocality of tumor deposits [1, 4,5,6]. Unfortunately, the NCDB does not record data regarding multifocality. Similarly, data entries for tumor extension in the Collaborative Stage Data Collection System (https://cancerstaging.org/cstage) used in the NCDB database are not robust enough for evaluation of this potential prognostic factor. It is therefore regrettable that instead of coding this factor more rigorously, the NCDB has decided not to record tumor extent at all for cases entered after 2016 [24].

Another drawback to the NCDB is that data on local recurrence, development of distant metastasis, and cause of death for each patient are not available, precluding analysis of local recurrence-free, disease-free, or disease-specific survival [9, 10]. It may also be argued that the lack of central pathology review by expert pathologists compromises the reliability of histologic diagnoses (particularly rare ones) and assigned tumor grades. However, distinction between the unfavorable and favorable subtypes of retroperitoneal sarcoma is not overly complicated [1].

Summary

Multiple statistical analyses indicate that the Vanderbilt staging system is a better method of risk stratification for patients with retroperitoneal sarcoma than either the 7th or 8th editions of the AJCC staging system. This staging system has now been validated in a separate large cohort of patients from the NCDB database. Although development of site-specific staging systems for soft tissue sarcoma was an appropriate modification, the first iterations in the 8th edition of the AJCC Cancer Staging Manual can certainly be improved [1, 22, 23]. The Vanderbilt system for retroperitoneal sarcoma incorporates information regarding histologic subtype, but other factors such as measures of regional tumor extension or multifocality may also be worthy of consideration. AJCC committees should verify whether proposed changes in revised staging algorithms improve prognostic performance before their approval and implementation.