Main

Advanced hepatocellular carcinoma (HCC) had been a disease with no proven treatment until sorafenib was demonstrated to provide survival benefits in two randomised studies in 2007 (Llovet et al, 2008; Cheng et al, 2009). However, the efficacy of sorafenib is modest. Novel compounds either alone or in combination with sorafenib have been actively explored in phase II or even phase III studies (Hsu et al, 2010a, 2010b; Shao et al, 2010; Cheng et al, 2011).

A staging system that can accurately predict prognosis is crucial for clinical trial designs. For phase II studies, such a system can aid in the selection of a relatively homogeneous yet representative patient population and in the comparison of efficacies across different studies. For phase III studies, such a system can facilitate proper patient stratification and ensure that patient characteristics between treatment arms are balanced. All the commonly used staging systems for HCC were developed in the pre-sorafenib era, and none of them were designed specifically for advanced disease (Okuda et al, 1985; The Cancer of the Liver Italian Program investigators, 1998; Chevret et al, 1999; Llovet et al, 1999; Green et al, 2002; Leung et al, 2002b; Kudo et al, 2003; Tateishi, 2005; Zhang et al, 2010). Although a new staging system was proposed by Tournoux-Facon et al specifically for patients with HCC in palliative setting (Tournoux-Facon et al, 2011), this new system has not been thoroughly compared with staging systems commonly used in East Asia, such as Chinese University Prognostic Index (CUPI), Japan Integrated Staging (JIS) scores, and China integrated score (CIS).

Two prior studies evaluated the ability of commonly used staging systems to predict the prognosis of patients with advanced HCC (Collette et al, 2008; Huitzil-Melendez et al, 2010). One study found that the Groupe d’Etude et de Traitement du Carcinome Hepatocellulaire (GETCH) predicted patients’ prognosis more accurately than did other staging systems at a referral centre in the Unites States. The study did not focus on the clinical trial population (Huitzil-Melendez et al, 2010). The other study demonstrated that the Cancer of the Liver Italian Program (CLIP) score more accurately predicted prognosis than Okuda and Barcelona Clinic Liver Cancer (BCLC) for patients who received either supportive care, tamoxifen alone, or tamoxifen in combination with transarterial chemoembolisation (TACE) (Collette et al, 2008). This study did not include the GETCH and CUPI staging systems for analysis, and the treatment is very different from the current standards. Therefore, it remains unclear which of these commonly used staging systems can most accurately predict the prognosis for patients with advanced HCC who are enrolled in clinical trials.

Currently, most clinical trials of advanced HCC use BCLC and Child-Pugh scores to select patients. However, the survival outcomes of these ‘well-selected’ patients are highly variable. We hypothesised that certain current HCC staging systems can still predict prognosis of these patients, who were enrolled in clinical trials for advanced HCC. The current study was thus conducted to examine the prognosis-predicting performance of 10 staging systems, including the American Joint Committee on Cancer (AJCC; 6th edition) (Green et al, 2002), BCLC (Llovet et al, 1999), CIS (Zhang et al, 2010), CLIP score (The Cancer of the Liver Italian Program investigators, 1998), CUPI (Leung et al, 2002b), GETCH (Chevret et al, 1999), JIS score (Kudo et al, 2003), Okuda (Okuda et al, 1985), Tokyo score (Tateishi, 2005), and the staging system proposed by Tournoux-Facon et al (2011).

Materials and methods

Study population and variables

All patients who were enrolled in clinical trials that involved first-line systemic therapy for advanced HCC from May 2005 to June 2010 at National Taiwan University Hospital (NTUH), Taipei, Taiwan, were included in this study. All these studies targeted HCC patients with metastatic or locally advanced disease not amenable to loco-regional therapies, including surgery, TACE, and local ablation. All patients were required to have adequate liver reserve and organ function, good performance status, and measurable lesions according to RECIST criteria (version 1.0) (Therasse et al, 2000). Treatment regimens included either bevacizumab plus capecitabine, sorafenib plus tegafur/uracil, thalidomide plus tegafur/uracil, sorafenib, or sunitinib (Hsu et al, 2010a, 2010b; Cheng et al, 2011; Shao et al, 2012).

Data regarding patient characteristics, laboratory examination results, and overall survival (OS) were retrieved from the original study records. All patients were assessed following the rules (summarised in Supplementary Table 1) of the AJCC (6th edition) (Green et al, 2002), BCLC (Llovet et al, 1999), CIS (Zhang et al, 2010), CLIP score (The Cancer of the Liver Italian Program investigators, 1998), CUPI (Leung et al, 2002b), GETCH (Chevret et al, 1999), JIS score (Kudo et al, 2003), Okuda (Okuda et al, 1985), Tokyo score (Tateishi, 2005), and the staging system proposed by Tournoux-Facon et al (2011). This study was approved by the Institute Research Ethical Committee of NTUH.

Statistical methods

Statistical analyses were performed with the SAS statistical software (version 9.1.3, SAS Institute Inc., Cary, NC, USA). In statistical testing, a two-sided P-value 0.05 was considered statistically significant. The prognostic predictions of different staging systems were compared univariately by two methods. First, the Kaplan–Meier method was used to estimate OS. For every staging system, OS was compared between every stage group using the log-rank test. Second, concordance (c) indexes were calculated for all staging systems according to the accuracy of their prediction of OS rankings and then compared with each other.

The Cox’s proportional hazard model was utilised to compare the 10 staging systems while adjusting other variables with a potential impact on OS. These variables included treatment regimens, age, gender, hepatitis aetiology (hepatitis B virus (HBV) or hepatitis C virus (HCV)), Karnofsky performance scale, and the presence of prior treatment. Staging systems were compared with one another using a model that involved a stepwise variable selection procedure in which the significance levels for entry and significance levels for stay were set to 0.15. Values of R2 and Akaike information criterion (AIC) representing the accuracy of the OS prediction were then calculated for each staging system while adjusting for the confounding variables found by the Cox’s model. Higher R2 or lower AIC mean better prediction of OS.

Results

Patient characteristics

A total of 157 patients, with a median age of 56 years, were included in the current study. Patients received one of the following regimens as first-line therapy for advanced HCC: bevacizumab plus capecitabine (n=20), sorafenib plus tegafur/uracil (n=68), thalidomide plus tegafur/uracil (n=34), sorafenib (n=15), or sunitinib (n=20). The patient characteristics are summarised in Table 1. Eighty-six percentage of patients were male; 75% were seropositive for HBV surface antigen (HBsAg); 16% were seropositive for antibody against HCV (anti-HCV); and 92% had either extrahepatic metastasis or macroscopic vascular invasion. Except for one patient with Child-Pugh B (score=7) liver reserve, all others were classified to have Child-Pugh A liver reserve. All patients had Karnofsky performance scale indexes 70; 120 (76%) patients had a Karnofsky performance scale index 90.

Table 1 Patient characteristics

Patients were classified into stage groups according to 10 staging systems. The distribution of patients among the stage groups is presented in Table 2. As the study focused on patients with advanced HCC enrolled in clinical trials, no patients with early and surgically resectable cases such as AJCC stage I or BCLC stage A were included. Nine (6%) patients had AJCC stage II disease and 11 (7%) patients had BCLC stage B disease. These patients had disease either refractory to TACE or not amenable for TACE owing to hypovascularity. The clinical trials also excluded patients with end-stage disease or severe liver dysfunction. Therefore, none of the patients were classified as CLIP score 5, BCLC stage D, Okuda stage III, or high risk according to the staging system proposed by Tournoux-Facon et al. Interestingly, patients with different CLIP scores were more evenly distributed, with 10%, 20%, 21%, 25%, and 24% of patients having CLIP scores of 0, 1, 2, 3, and 4, respectively.

Table 2 Patient distribution of stage groups

Survival comparisons among stage groups

As of 31 December 2010, 138 (88%) patients had died with a median follow-up time of 35.1 months. Only two patients lost follow-up. The median OS of all patients was 6.6 months (95% confidence interval, 5.3–7.9 months). Kaplan–Meier analysis was utilised to estimate the OS, and the log-rank test was used to univariately compare the survival of every stage group (Figure 1). The CIS (P<0.001), CLIP score (P<0.001), CUPI (P<0.001), GETCH (P<0.001), Okuda (P<0.001), Tokyo score (P<0.001), and the staging system of Tournoux-Facon et al (P<0.001) differentiated OS by their stage grouping, whereas the AJCC (P=0.133), BCLC (P=0.269), and JIS score (P=0.327) failed to do so. Notably, patients with CIS scores=2 had better survival than patients with CIS scores=1.

Figure 1
figure 1

Kaplan–Meier analysis of overall survival (OS) by every stage group. (A) American Joint Committee on Cancer (AJCC), (B) Barcelona Clinic Liver Cancer (BCLC), (C) Okuda, (D) Cancer of the Liver Italian Program (CLIP) score, (E) Groupe d’Etude et de Traitement du Carcinome Hepatocellulaire (GETCH), (F) Chinese University Prognostic Index (CUPI), (G) Japan Integrated Staging (JIS) Score, (H) Tokyo score, (I) China integrated score (CIS), and (J) the system proposed by Tournoux-Facon et al P-values by log-rank test.

C indexes were calculated for all the staging systems. The GETCH, CUPI, CLIP score, Okuda, and the staging system proposed by Tournoux-Facon et al had the highest c indexes (0.792, 0.775, 0.752, 0.723, and 0.710, respectively), which were not significantly different from one another (Table 3). The AJCC, CIS, and BCLC had the lowest c indexes (0.576, 0.546, and 0.535, respectively, Table 3).

Table 3 Concordance indexes, R2 and AIC of staging systems for their prediction of overall survival

To adjust for variables that were less frequently incorporated into staging systems but may also have a prognostic impact on survival, we analysed all staging systems along with these variables in the multivariate analysis, including treatment regimens, age, gender, serum HBsAg, serum antibody against HCV, Karnofsky performance scale, and the presence of prior treatment for localised disease. In the final model, the CLIP score and CUPI emerged as the most accurate predictors of OS (P<0.001 and 0.009, respectively, Table 4). Hepatitis B virus infection and poor performance status were also found to predict poor OS. Adjusting for these two confounding factors, we found that the CLIP score and CUPI yielded the highest R2 values (0.2938 and 0.1950, respectively) and the lowest AIC (1134.9 and 1155.5, respectively) for predicting OS (Table 3).

Table 4 Final Cox’s proportional hazards modela for best staging systems to predict overall survival

Discussion

This study demonstrated that the CLIP score and CUPI can better predict survival of patients with advanced HCC who had been enrolled in clinical trials using anti-angiogenic agents as first-line therapy. This is the first study specifically focusing on such a patient population. The results can be used in the design of future clinical trials for the treatment of advanced HCC. Although all patients were selected by the eligibility criteria of clinical trials to ensure good liver reserve (99% Child-Pugh A) and performance status, these two staging systems could successfully differentiate the survival outcome within their stage groups. Their prognostic prediction was better than other systems as determined by different statistical analyses.

Although the study described here was a retrospective analysis, most items in the staging systems examined were prospectively collected upon patient enrolment in the clinical trials. Survival results were mature and very few patients lost follow-up. However, the results may be biased because the study only included patients from one institute. Nevertheless, such a bias should be limited because the selection criteria used in this study were generally consistent with those commonly used in other clinical trials of systemic therapy for advanced HCC.

Previous studies examined survival of patients with advanced HCC in a heterogeneous patient population. Treatment ranged from supportive care, TACE, cytotoxic chemotherapy, to targeted therapy (Collette et al, 2008; Huitzil-Melendez et al, 2010; Lin et al, 2012), and patients may not all have been enrolled in clinical trials. These studies found that CLIP score, CUPI, or GETCH can better differentiate prognosis of these patients. Among them, our prior study found CLIP score as a better staging system for patients who received various systemic treatment for advanced HCC (Lin et al, 2012). In the current study, we focused on a patient population that is more relevant to current practice. The results demonstrated that the CLIP score and CUPI emerged as the best systems for predicting OS after adjusting for other potential prognostic factors that are not included in most staging systems. Above all, CLIP could be considered a pivotal stratification factor in clinical trial designs because it was repeatedly demonstrated to predict prognosis of patients enrolled in clinical trials, regardless the treatment regimens.

In addition to the CLIP and CUPI, we found that viral aetiology was a prognostic factor in the multivariate analyses, which is consistent with other reports (Leung et al, 2002a; Cantarini et al, 2006; Chen et al, 2006; Shao et al, 2011). Positive HBsAg was associated with poorer survival (Cantarini et al, 2006; Chen et al, 2006; Shao et al, 2011). As HCC resulting from different aetiological factors can have different carcinogenesis and molecular signatures (Okabe et al, 2000; Laurent-Puig et al, 2001; Iizuka et al, 2002; Moinzadeh et al, 2005), it is not surprising that aetiology should have an impact on prognosis of patients with HCC. On the contrary, several potential prognostic predictors were not identified by the current analysis because some of them (e.g., α-fetoprotein) were incorporated into the staging systems, while others were homogenous (e.g., all but one of our patients had Child-Pugh A status) in the entire study population.

In conclusion, our study indicates that several current HCC staging systems, especially CLIP score and CUPI, can predict survival of a highly selected patient cohort consisting of patients who were enrolled in clinical trials of advanced HCC. These two staging systems should be considered when selecting eligibility criteria and/or setting the stratification for randomisation to ensure an optimal clinical trial design.