Main

Soft tissue sarcomas (STSs) account for approximately 1 to 2% of all adult cancers. Although local control can be obtained through the use of surgery plus radiotherapy, up to 50% of patients will recur at distant sites (Clark et al, 2005). At the metastatic stage, palliative chemotherapy can be considered as a reasonable option in the majority of cases. Both ifosfamide and doxorubicin are the best single agents with activity in the treatment of STS. In general, the toxicities are manageable but real, thereby making treatment difficult (Sleijfer et al, 2005; Benjamin 1987). However, most if not all patients will ultimately relapse and die of their disease. The median overall survival (OS) is actually 12 months and the median time to progression is 3 months. Regarding these facts, three options could be considered for the treatment of patients with advanced/metastatic STS: (1) the combination of doxorubicin and ifosfamide if resection of metastasis looks feasible or in the case of symptomatic patients with rapidly progressing tumours (2) single-agent chemotherapy and (3) exclusive best supportive care (Blay et al, 2003; Benjamin 1987). In the everyday practice, many factors are integrated in the decision-making process: general condition of patient, underlying co-morbidities, medical history, resectability of metastasis, patient choice and knowledge of prognostic factors (PFs).

One of the most important factors that should be taken into account is the risk that a patient will die early after initiation of treatment. Particularly in patients with a high risk of dying early, it is likely that best supportive care should be preferred over systemic therapy with its accompanying toxicities.

In addition to patient care, insight into factors associated with early death after initiation of treatment is crucial for the design of studies, as the lack of reliable guidance for the life expectancy prediction is likely to introduce some biases in these clinical studies. Moreover, the known PFs for outcome have to be taken into account in the design of clinical trials, for example, as stratification factors at entry for randomised clinical trials (Simon and Altman, 1994).

The literature shows that life expectancy is most often overestimated, which frequently results in overtreatment (Maltoni et al, 1994; Christakis and Lemont, 2000; Penel et al, 2008; Clément-Duchêne et al, 2010). For example, despite the fact that a life expectancy of <3 months is an exclusion criterion for all phase I studies, without reliable guidance, 20–30% of patients enroled in phase 1 trials in expert centres died within the first 90 days (Arkenau et al, 2008, 2009; Penel et al, 2008). Despite the high need for models establishing the risk for early death and a large body of literature on prognostic models for cancer patients, only few of these models are suitable for daily decision making. Before implementation in clinical practice, such prognostic models need to be identified and subsequently validated in independent series. Additionally, such models should be robust and simple, in order to be easily applied in daily practice.

Regarding the severity of advanced STS and the limited therapeutic options, we carried out a retrospective exploratory analysis to (1) develop prognostic models for early death in such population and (2) then validate the models in an independent data set. For the purpose of this study, two complementary approaches were used for the development of the models (scoring system derived from logistic regression analysis and algorithm derived from decision-tree analysis). The models were developed with the Soft Tissue and Bone Sarcoma Group (STBSG) data set and subsequently validated with the French Sarcoma Group (FSG) data set.

Materials and methods

Data sets

We used two data sets. The first one was formed by the STBSG and includes patients treated with single agents and combination regimens as first-line treatments between January 1976 and October 2001. The second one had been built by the FSG and includes patients treated with combination regimen (MAID and intensified-MAID) between January 1994 and October 2008 (Fayette et al, 2009; Bui et al, 2009).

Primary end point

The primary end point was 90-day mortality (early death). This threshold is believed to be relevant in decision making for advanced cancer patients in whom the choice of whether to treat with chemotherapy or best supportive care need to be discussed (Sessa et al, 1996; Geraci et al, 2006; Kelly et al, 2007; Penel et al, 2009).

Development of the models

The development of these models used the STBSG database. The potential PFs were: gender, age, performance status (WHO-PS), histological subtypes (angiosarcoma, fibrosarcoma, leiomyosarcoma, malignant histiocytofibroma, malignant peripheral sheath nerve tumour, rhabdomyosarcoma, synovial sarcoma, unclassified STS and unknown subtype and a group containing the remaining rare subtypes), histological grade, tumour location (abdominal, breast, gynaecological, head and neck, lower limbs, skin, trunk upper limbs, visceral, other locations and unknown), time interval since initial diagnosis of STS and start of systemic therapy, presence of lung metastasis, presence of liver metastasis, presence of bone metastasis and presence of other metastases. Two continuous variables (age and time interval since initial diagnosis of STS) were recorded according to the observed quartiles. All these items were collected in both databases.

Two complementary approaches of logistic regression analysis and classification analysis (CHAID) (Melchior et al, 2001; Barton et al, 2005; Chan et al, 2006; Ambalavanan et al, 2006; Courville et al, 2009) were used for development of the prognostic models (scoring system and decision tree, respectively).

Separate logistic regression analysis with stepwise selection of variables (at P<0.05) identified variables in the entire STBSG cohort associated with early death. The odds ratios (ORs) from the logistic regression model were converted into points by dividing by the smallest OR for any given other OR. Next, an overall score was assigned to each patient by summing the points they received for each of the PFs.

In the current study, CHAID was used as a complementary method; this technique uses a systematic algorithm to detect the stronger association between potential PFs (named ‘splitter’) and the outcome variable (e.g., early death). Step by step, the CHAID algorithm recursively partitions data into mutually exclusive, exhaustive subsets that are maximally different in the dependent variable (e.g., early death), as assessed with the use of Bonferroni-adjusted χ2 statistics. The CHAID algorithm consisted, herein, in three successive actions: (1) merge the subgroups with similar occurrence of target variable (αmerge=0.01), (2) split the subgroups using the best PF (αsplitt=0.01) and (3) terminate the tree when the observed number of early death was 30 (Melchior et al, 2001; Barton et al, 2005; Chan et al, 2006; Ambalavanan et al, 2006; Courville et al, 2009).

Furthermore, the prognostic accuracy (and its 95% confidence intervals (95% CIs)) was tabulated for both models (scoring system and decision-tree analysis) using a classical 2 × 2 table. The optimal threshold for each prognostic model was established using classical receiver operating characteristics analysis to maximise both sensitivity and specificity (Linden 2006). The accuracy is basically the rate of well-classified patients.

Validation and head-to-head comparison of both models

Both models were applied to the independent data set (FSG database). The score was calculated for each patient according to the observed presence of PFs (a patient with any missing source variable was given a missing value of score). The entire population was split according to the proposed CHAID algorithm. Performance of both models had been established using area under receiver operator curve and accuracy as previously specified.

Results

Patient characteristics

The main patient characteristics are depicted in Table 1. The STBSG cohort included 3002 patients; 2898 (96.5%) of them were assessable for the early death rate. The rate of early death was 251 out of 2898 (8.6% (95% CI: 7.6–9.6)). In this cohort, the median OS was 348 days. Out of the 404 patients from the FSG cohort, 399 (98.7%) were assessable for the early death rate. The rate of early death was 18 out of 399 (4.5% (2.4–6.5)). The median OS of this cohort was 788 days.

Table 1 The characteristics of patients in both cohorts

Development of the scoring system

The following parameters were associated with the risk of early death in univariate analysis: age 60 (OR=1.77, P=0.003), PS=1 (OR=3.04, P<0.0001), PS 2 (OR=10.00, P<0.0001), time interval since the initial diagnosis of sarcoma 540 days (OR=0.41, P<0.0001), presence of liver metastasis (OR=2.37, P=0.0041) and presence of other metastases (OR=2.00, P=0.0061; Table 2). The final logistic regression analysis retained the following PFs for early deaths: PS=1 (P<0.0001), PS 2 (P<0.0001), presence of liver (P=0.0014) or other metastases (P=0.0055). For the attribution of points in the scoring system, each adjusted OR was divided by the smallest one, which appeared to be the presence of metastases other than lung, bone and liver with an OR of 2.0. Thus, PS=1 then gave 2 points (3.83/2.00), PS 2 gave 6 points (12.00/2.00), presence of liver metastasis gave 1 point (2.37/2.00) and presence of other metastases also gave 1 point (2.00/2.00). As a result, the sum of points ranged from 0 to 8 and the risk of early death from 3 to 40% (Table 3). The area under receiver operator curve was 0.69 (0.66–0.73). The optimal threshold of this scoring system was set at 3. Using this threshold, the prognostic accuracy was 91.0% (90.5–0.92.7), the positive predictive value was 22.0% (17.3–27.3) and the negative predictive value was 88.7% (87.4–89.4).

Table 2 Risk factors for early death (development data set)
Table 3 Early death rates according to the score and score accuracy in both cohorts

Development of the decision tree

The CHAID analysis provided a very simple algorithm. In the decision tree, the most powerful discriminator (splitter) was the PS; three subsets of patients were discriminated with increasing risk of early death: patients with PS=0 (early death rate: 3.3%), patients with PS=1 (early death rate: 9.4%) and patients with PS 2 (early death rate: 25.5%). There was no discriminator able to split the two first categories of patients. In the development data set, among patients with PS 2, the histological grade was able to individualise two subsets of patients; when the grade was 3, the rate of early was 36.3% and in the other situations, the rate of early death was 19.5% (Figure 1).The area under the receiver operator curve was 0.67 (0.64–0.71). The optimal classification was based on the separation of patients with PS=(0–1) from other patients (Table 4). Using this classification, the prognostic accuracy was 86.2% (84.5–87.4), the positive predictive value was 25.3% (20.4–30.1) and the negative predictive value was 93.3% (92.0–94.6).

Figure 1
figure 1

CHAID algorithms. (A) STBSG data set and (B) FSG data set.

Table 4 Early death rate according to probability tree analysis and algorithm accuracy in both cohorts

Validation and head-to-head comparison of both models

Both models were applied to the FSG cohort. The scoring system was applicable to 249 out of 404 patients (61.6%). In this cohort, the area under receiver operator curve was 0.68 (0.52–0.83). In this validation cohort, the prognostic accuracy of this scoring system, with a threshold set at 3, was 67.7%. The CHAID algorithm was applicable to 347 out of 404 patients (85.9%). Among patients with PS 2, the rates of early death were similar whatever the histological grade. In this cohort, the area under receiver operator curve was 0.72 (0.58–0.86). In this validation cohort, the prognostic accuracy of the CHAID algorithm was 89.0% (86.5–93.5). In the development cohort, the prognostic accuracy of the CHAID algorithm was superior to the one of the scoring.

Because PS appears as the most important PF in both models, we evaluated the accuracy of PS as a PF for early death (see Figure 2). In the validation cohort, the area under the receiver operator curve of PS was similar to both other models: 0.72 (0.58–0.86).

Figure 2
figure 2

Area under receiver operator curves. (A) STBSG database. Grey dotted line: logistic regression-based scoring; black dotted line: χ2 interaction detection algorithm; black line: performance status alone as prognostic model (this line completely covers the CHAID algorithm line). (B) FSG database. Grey dotted line: logistic regression-based scoring; black dotted line: χ2 interaction detection algorithm; black line: performance status alone as prognostic model (this line completely covers the CHAID algorithm line).

Discussion

In both cohorts consisting of patients who were treated in the context of clinical trials with clear predefined eligibility criteria, the rates of early death were relatively low (8.6 and 4.5%). The rate of early death was significantly lower in the most recent trials (the French Sarcoma Group ones) that explored the role of poly-chemotherapy (MAID regimen) and the role of intensive chemotherapy. Hence, it is not surprising that the rate of early death was lower in trials with more stringent eligibility criteria. We have observed that the rate of early death in this population is lower than the reported one among patients entering in phase 1 trials (16%) or in patients with carcinoma of unknown primary (30%) (Geraci et al, 2006; Kelly et al, 2007; Arkenau et al, 2008; Penel et al, 2008, 2009; Ferté et al, 2010). The present analysis shows that PS is the most powerful PF for early death among patients with advanced STS treated with first-line systemic therapy. Using two complementary approaches, we developed and validated two prognostic models, which were however essentially based on the assessment of PS. The scoring system was more complex, incorporating three parameters, and was therefore available in a more limited part of the study population. The CHAID algorithm was based on twp variables (PS and histological grade) and was available in a larger proportion of patients. The implementation of the CHAID algorithm to the validation data set had showed that the second splitter (histological grade) did not improve the discrimination obtained with the first splitter (PS). Lastly, the prognostic performance of PS alone was as good as both other more sophisticated models (Figure 2).

The list of factors associated with early death identified in the present study is not surprising, and most of the identified factors have previously been revealed in other advanced STS databases as being related to outcome. The impact of presence of extra-pulmonary and especially liver metastasis had been mentioned as a PF for OS in non-pretreated patients (Karavasilis et al, 2008; Ray-Coquard et al, 2009). Grade has also been identified as a PF for OS in non-pretreated patients (Antman et al, 1993). The time interval since the initial diagnosis of STS has been identified as a PF for OS in non-pretreated patients by Maurel et al (2009). Performance status appeared as a PF for OS in most studies (Borden et al, 1987; Karavasilis et al, 2008; Maurel et al, 2009; Ray-Coquard et al, 2009). The database used for the development of these predictive models for early death had previously been used for analysis of other PFs. Van Glabbeke et al (1999) found that the following factors were associated with OS in patients who received doxorubicin: PS (HR=1.51, P<0.0001), liver metastasis (HR=1.46, P<0.0001), histological grade (HR=1.24, P=0.0002), time since initial diagnosis (HR=0.92, P=0.0004) and age (HR=1.10, P=0.0045). In the same study, the following parameters were associated with objective response: liver metastasis (OR=0.38, P<0.0001), age (OR=0.83, P=0.0024) and histological grade (OR=1.35, P=0.051). Blay et al (2003) identified the following parameters as factors associated with long-term survival: PS (OR=2.02, P=0.0235), grade (OR=2.18, P=0.0009), female (OR=0.43, P=0.0291) and complete response obtained after first-line treatment (OR=0.11, P=0.0001). The originality of this work is that the relative weight of each PF and its interaction were studied in a very large data set using two complementary approaches, and the identified factors were validated in an independent cohort. At the end, the prognostic value of PS outweighs all other clinical parameters.

The limitations of this study are related to its retrospective nature. For example, grade scoring and pathological classification change during the study period. The prognostic value of grade scoring is not established for some particular histological sub-types (such as clear cell sarcoma, epithelioid sarcoma, angiosarcoma and so on). Moreover, the parameters used herein are the basic clinical variables. For other tumour types, more elaborate and sophisticated scores have been published, which integrate other parameters such as lymphocyte count, LDH level or albumin (Geraci et al, 2006; Kelly et al, 2007; Arkenau et al, 2008; Penel et al, 2008, 2009). These parameters were not available in a sufficiently large section of the STBSG cohorts and could not be explored in the present study. The additional value of these biological parameters should be evaluated. Furthermore, one could argue that the development of prognostic model for early death might be more relevant in patients failing first-line chemotherapy and candidates for second-line treatment. We plan this second analysis. Moreover, we ignore the precise cause of the death in both databases, especially the incidence of toxic death vs death caused by progressive disease.

Despite its subjective nature, estimation of general condition by PS remains one of the most powerful PF in advanced/metastatic STS patients. Nevertheless, PS is not suitable for disabled and/or patients who suffer with persistent pain, for example, patients treated with previous mutilating surgical procedure. There are several scales available for scoring PS: the WHO-PS scale, the ECOG-PS scale or the Karnofsky PS scale. These different scales can generate discrepancies for evaluation of PS. It is well known that physicians overestimate both PS and life expectancy (Parkes, 1972; Evans and McCarthy, 1985; Christakis and Lemont, 2000; Ando et al, 2001). Ando et al (2001) had demonstrated that nurses and patients themselves estimate more accurately the actual PS than physicians.

Nevertheless, regarding its prognostic value, this variable has to be taken into account in future clinical trials, for example, as stratification factors at inclusion in randomised trials. For clinical decision making, both models developed are far from ideal. The probability of early death for a patient with PS 2 is 11–24% compared with 0–3% for those with PS=0 (Figure 1). In everyday practice, this does not imply that all patients with PS 2 should be denied palliative chemotherapy per se, but the high risk of an early death and the potential lack of benefit from treatment should be discussed with the patient before a decision is made to proceed with chemotherapy. Further studies are warranted to develop and validate more accurate prognostic models, if possible based on objectively measurable variables (such as biological parameters) and excluding PS.