Main

Hepatocellular carcinoma (HCC) is one of the most common malignancies in Asia and Africa, and its incidence in the Western world is increasing (Torre et al, 2015). The Barcelona Clinic Liver Cancer (BCLC) classification (Llovet et al, 1999) has been endorsed as the best staging system and treatment algorithm for HCC by the European Association for the Study of Liver Disease (EASL) and the American Association for the Study of Liver Disease (Bruix et al, 2005; European Association For The Study Of The Liver and European Organisation For Research and Treatment Of Cancer, 2012). In addition to estimating prognosis, the main advantage of the BCLC staging system is that it establishes a link between staging and treatment indications. For patients with intermediate-stage (BCLC-B) HCC, transarterial chemoembolization (TACE) is recommended as the first-line therapy with improved 2-year survival compared with conservative treatment. Hepatic resection (HR) is indicated only for patients with early-stage HCC (BCLC-A) with satisfactory liver function (BCLC-A; Bruix et al, 2005; European Association For The Study Of The Liver and European Organisation For Research and Treatment Of Cancer, 2012). Furthermore, HR for intermediate-stage HCC is considered to be a poor option which is associated with an unfavourable prognosis (Bruix et al, 2004).

However, many large centres with extensive experience in treating HCC, particularly those in Asia, do not subscribe to these guidelines (Chow, 2012; Yamakado and Kudo, 2014; Colombo and Sangiovanni, 2015; Zhong et al, 2015). In real-world practice, a large proportion of patients with intermediate-stage HCC receive surgical resection (Torzilli et al, 2013; Roayaie et al, 2015), particularly in regions where donor lives are scarce. Numerous recent studies have demonstrated that HR for intermediate-stage HCC appears to benefit patients in terms of the 5-year survival rates, which ranges from 23.9 to 61.2%, and median survival time, which ranges from 22.5 to 60.4 months (Wang et al, 2008; Delis et al, 2010; Luo et al, 2011; Zhong et al, 2013; Jianyong et al, 2014; Yin et al, 2014; Cucchetti et al, 2015). As patients with intermediate-stage HCC encompasses a variety of tumour burden, hepatic function (Child–Pugh A or B) and disease aetiology, aggressive surgical resection may not be the optimal therapy for all these patients. Therefore, a suitable subset of patients who would truly benefit from HR remains to be identified.

In this study, a simple scoring system was established based on preoperative data, with the goal of accurately predicting the postoperative long-term survival of patients with intermediate HCC. The aim was to identify the subgroup of patients with intermediate-stage HCC who would benefit most from HR.

Materials and methods

Eligibility

Intermediate-stage HCC (BCLC-B) was defined on the basis of the BCLC classification as follows: 2 to 3 tumours, of which at least 1 was >3 cm in diameter; >3 tumours of any diameter; and the absence of extrahepatic metastasis, the absence of tumour invasion into the portal or hepatic veins and performance status 0 (Forner et al, 2010). All patients initially diagnosed with intermediate-stage HCC by histology or dynamic computed tomography (CT)/magnetic resonance imaging scans according to EASL diagnostic criteria (European Association For The Study Of The Liver and European Organisation For Research and Treatment Of Cancer, 2012) who were treated with HR and pathologically proven multiple HCCs were retrospectively screened for eligibility.

Inclusion and Exclusion Criteria

Only patients who met all of the following criteria were enroled in the study. (a) Good surgical risk patients >18 years and 75 years of age; (b) patients diagnosed with intermediate-stage HCC; (c) well-preserved liver function (Child–Pugh class A); (d) no anticancer treatment before surgery; (e) resectable disease according to previously reported criteria (Luo et al, 2011), which was defined as the possibility to completely remove all tumours while retaining a sufficient liver remnant to maintain postoperative liver function, as assessed by our surgical team.

Patients were excluded from the study if they had one or more of the following: (a) a platelet count of <50 × 109 l−1; (b) Child–Pugh class B or C; (c) palliative tumour resection; and (d) incomplete data.

The results of the training cohort were then confirmed in an independent external validation cohort using the same inclusion/exclusion criteria.

Radiology Interpretation

All patients underwent a standardised liver imaging protocol (Pomfret et al, 2010). The image data were directly interfaced with a Picture Archiving and Communication System (PACS; Centricity, GE Healthcare, Milwaukee, WI, USA), which was used to display all image data on monitors with adjustment of the optimal window setting for each case, and viewed by two experienced abdominal radiologists. For the measurement of HCC lesions, the largest diameter was recorded using electronic calipers on the PACS monitor. Any disagreements were resolved by consensus.

Data Collection

This retrospective study was conducted in accordance with the ethical guidelines of the 1975 Declaration of Helsinki, and written informed consent was obtained from each participant. The study procedure was approved by the institutional review board of the Sun Yat-sen University Cancer Center and the Nanfang Hospital of Southern Medical University. The study was censored on 31 May 2015.

In all cases, data (demographics, clinical, biological, radiological, treatment outcomes, and adverse events) were prospectively collected. HCC was staged according to the American Joint Committee on Cancer seventh edition (Edge and Compton, 2010; Okuda et al, 1985), Cancer of the Liver Italian Program (CLIP; Kudo et al, 2003), and the Chinese University Prognostic Index (CUPI; Leung et al, 2002). Furthermore, information on the recently proposed BCLC-B sub-classification (Bolondi et al, 2012) and the NDR scoring system for multiple HCC (Yang et al, 2015) was provided.

Adverse events that occurred in the hospital during admission for surgery or death within 90 days after HR were documented according to the Common Terminology Criteria for Adverse Events version 3.0 (CTCAE; Trotti et al, 2003).

Hepatic Resection

HR was performed using techniques as described previously (Shi et al, 2007; Luo et al, 2011). Intraoperative ultrasonography was performed routinely to assess the number and size of lesions and the relationship of the tumours to vascular structures. Pringle’s manoeuvre was routinely used with a clamp/unclamp time of 10 min/5 min. Anatomic resection was our preferred surgical method for multiple nodules in one segment or in neighbouring segments. For multiple bilobar nodules, anatomic resection was preferred for the main tumour, whereas satellite nodules were resected nonanatomically with a negative resection margin. When an inadequate liver remnant was found, nonanatomic resection was performed with a negative resection margin. A negative resection margin was defined as a lack of visible tumour cells at the margins of the remnant liver nearest to the gross edge of the tumour.

Follow-up

Follow-up examinations were conducted using laboratory findings (including serum AFP, liver function, and blood tests), abdominal ultrasonography, and contrast-enhanced CT every 3 months for the first year and every 6 months thereafter for a total of 60 months after treatment. All patients with HBV-related HCC who were prepared for resection for their HCC in our hospital were counselled by a hepatologist for antiviral therapy regardless of the serum HBV DNA result (Wong et al, 2011). For patients in whom tumour recurrence was detected after undergoing tumour resection, the treatment choice was determined by the characteristics of the recurrent tumour, the patient’s request, and the results of discussion by our multidisciplinary team (Luo et al, 2011).

Statistical Analysis

The main end point of the study was overall survival (OS), which was calculated from the date of resection until death or the end of the follow-up period. Disease-free survival (DFS) was defined as the interval between the operation and the date of diagnosis of the first recurrence or the last follow-up. Survival curves were calculated using the Kaplan–Meier method. Median survival times and their 95% confidence intervals (CI) are reported. Continuous variables, such as prothrombin time and size (diameter) of the tumour, were transformed into categorical variables. Each of these continuous variables was divided into two or three leveled categorical data by setting one or two break point(s), respectively, which were then represented by one or two binary variable(s); P-values were calculated for each set of break points with univariate or multivariate Cox-proportional hazard regression, and the set of break points with the lowest P-value was retained if the value reached significance. Univariate analysis of OS time was performed on the estimation set. The log-rank test was performed to detect significant parameters in the univariate analysis. Parameters with a P-value (log-rank) <0.05 in the univariate analyses were entered into the multivariate analysis.

Multivariate Cox-regression analysis with stepwise selection was performed to detect independent predictors of OS (entry criteria for selection into the final multivariate model was P<0.05). The regression coefficients (B) of the Cox-regression model were multiplied by 3 and rounded to the nearest unit (1.00 units) to obtain simple point numbers to facilitate the bedside calculation of the NSP score.

The abilities of the different systems to differentiate prognosis were compared by the area under the receiver operating characteristic (ROC) curve for each score (which is equivalent to the concordance statistic (c-statistic); Hanley and McNeil, 1982). To perform this test, patients censored before 1, 3, and 5 years were excluded from the analysis. To further validate the discriminative ability of the NSP score, the NSP score was analysed as a survival predictor in each subgroup of commonly used staging systems by log-rank test methodology.

To avoid overoptimistic results due to model development and evaluation using the same data set, the prognostic performance of the NSP score was assessed in an independent external validation cohort.

All reported P-values were the result of two-sided tests. A significance level of 0.05 was applied throughout. Statistical analyses were performed using IBM SPSS v.19.0 (SPSS, Armonk, NY, USA).

Results

Clinical Characteristics of the Patients

In the training cohort, of 290 patients with intermediate-stage HCC who underwent HR at the Department of Hepatobiliary Oncology of Sun Yat-sen University Cancer Center between February 2005 and December 2012, 255 patients met the inclusion criteria (Figure 1A). For the external validation cohort, 169 patients with intermediate-stage HCC who received HR at the Nanfang Hospital of Southern Medical University between May 2005 and December 2012 were ultimately included for further analysis (Figure 1B). The baseline characteristics of the training cohort and the validation cohort are provided in Table 1.

Figure 1
figure 1

Flowchart in the training (A) and validation cohorts (B).

Table 1 Baseline patient and disease characteristics in the training cohort

At the time of censoring, 148 out of 255 (58.0%) patients in the training cohort and 110 out of 169 (65.1%) patients in the validation cohort had died. In the training cohort, 111 (75.0%) died of recurrence, 22 (14.9%) of liver failure, and 15 (10.1%) of other causes (haemorrhage=8, infection=3, gastric cancer=1, unknown=3). In the validation cohort, 86 (78.2%) died of recurrence, 15 (13.6%) of liver failure, and 9 (8.2%) of other causes (haemorrhage=5, hepatitis=2, renal failure=1, unknown=1).

OS and DFS in the Training and Validation Cohorts

The median follow-up periods for the training cohort and validation cohort were 26.6 (range, 1.3–96.7) months and 27.8 (range, 2.3–115.1) months, respectively. For the training cohort, the median OS was 31.5 (95% CI: 23.2–39.9) months, and the 1-, 3-, and 5-year OS were 77.5%, 46.5%, and 36.0%, respectively. For the validation cohort, the median OS for the validation cohort was 30.6 (95% CI: 23.9–37.3) months, and the 1-, 3-, and 5-year OS were 76.3%, 43.9%, and 32.1%, respectively.

During the follow-up period, HCC recurrence was identified in 190 out of 255 patients (74.5%) in the training cohort and 133 out of 169 patients (78.7%) in the validation cohort.

For the training cohort, the median DFS was 9.7 (95% CI: 7.1–12.3) months. For the validation cohort, the median DFS was 8.3 (95% CI: 4.9–11.8) months.

Univariate and Multivariable Cox-Regression Analyses in the Training Cohort

The seven parameters of predictive value in the univariate analysis (Table 2) were entered into a Cox-regression analysis. After the stepwise removal of variables that were not significant (step 1, glutamyl transpeptidase; step 2, aspartate aminotransferase; step 3, alkaline phosphatase; step 4, alpha-fetoprotein), only prothrombin time (12 s/>12 s), tumour size (6/6–9/9 cm), and tumour number (3/>3) remained as significant predictors of OS (Supplementary Figure 1A–C). The calculated regression coefficients (B-values) were multiplied by a factor of 3 and rounded to facilitate the NSP score calculation (Table 3). The NSP score for a patient can be calculated using the following equation, adding the sum of multiplying these three factors by their respective weights:

Table 2 Univariate analysis of prognostic factors in the training cohort
Table 3 Multivariable Cox-regression analysis of prognostic factors for patients in the training cohort

NSP score=N(3=0; >3=2)+S(6 cm=0; 6–9 cm=1; 9 cm=2)+P(1 2 s=0; >12 s=1).

NSP Score Predicts Survival in the Training and Validation Cohorts

In the training cohort, the NSP score identified six subgroups with different prognoses (Figure 2A). Using a threshold of 1, we defined two groups of patients with significantly different OS (P<0.001), with a median OS of 61.3 months (95% CI, 44.6–78.1 months) for an NSP score 1 (n=125), and a median OS of 19.3 months (95% CI, 14.5–24.2 months) for an NSP score >1 (n=130; Figure 2B). Importantly, the NSP score performed equally well in the validation cohort, with a median OS of 51.5 months (95% CI, 36.9–66.2 months) for an NSP score 1 (n=86), and a median OS of 17.3 months (95% CI, 12.8–21.8 months) for an NSP score >1 (n=83; Figures 2C, P<0.001).

Figure 2
figure 2

Kaplan-Meier estimated OS curves by NSP score. (A) Prognostic significance of the single-point scores. The dichotomous NSP score cut-off was established based on favourable median OS in the Kaplan–Meier curves. The prognostic significance of the two NSP-score groups (1 point, >1 point) for OS in the training (B) and validation cohorts (C). All analyses were performed using the Kaplan–Meier method (OS in months) and the log-rank test. A full colour version of this figure is available at the British Journal of Cancer journal online.

In addition, the NSP score also had good performance in DFS prediction, with two significantly different prognostic subgroups in the training cohort (20.2 vs 4.7 months, P<0.001, Figure 3A) and the validation cohort (14.2 vs 5.0 months, P<0.001, Figure 3B).

Figure 3
figure 3

Kaplan-Meier estimated DFS curves by NSP score. The prognostic significance of the two NSP-score groups (1 point, >1 point) for DFS in the training (A) and validation cohorts (B). A full colour version of this figure is available at the British Journal of Cancer journal online.

Comparison of the Predictive Accuracy of the NSP System and the Current Commonly Used Staging Systems in the Training and Validation Cohorts

Kaplan–Meier curves were generated for the BCLC-B sub-classification, TNM 7th, Okuda, CLIP, CUPI, and the NDR scoring system (Supplementary Figures 2A–F). The NDR scoring system, BCLC-B sub-classification, and TNM 7th staging systems showed clear different prognostic strata. The CLIP, CUPI, and Okuda systems presented overlapping survival curves or without statistical difference. When all the staging systems were introduced into the Cox-regression analysis, the NSP and NDR scoring system were selected as independent predictors in the training cohort (NSP score: HR=1.99, 95% CI: 1.33–3.00, P=0.001; NDR score: HR=1.62, 95% CI: 1.09–2.41, P=0.017). Only the NSP was selected as an independent predictor in the validation cohort (HR=2.74, 95% CI: 1.86–4.04, P<0.001).

We next determined which staging systems were the best at predicting survival evaluated by ROC curve area analysis. In the training cohort, the AUCs of the NSP system at 1, 3, and 5 years was 0.688, 0.739, and 0.732, respectively, and was greater than those of the six other commonly used staging systems for HCC (AUCs, 0.513 to 0.677; Figure 4A–C). In the validation cohort, the AUCs of the NSP system at 1, 3, and 5 years was 0.719, 0.750, and 0.718, respectively, and was greater than those of the six other commonly used staging systems for HCC (AUCs, 0.510 to 0.684; Figure 4A–C).

Figure 4
figure 4

Receiver operating characteristic curves. The AUC of the NSP score system and the other six staging systems to predict OS in the training and validation cohorts at 1 (A), 3 (B), and 5 (C) years. A full colour version of this figure is available at the British Journal of Cancer journal online.

NSP Score Predicts OS in Subgroups of the Current Commonly Used Staging Systems

When patients were further stratified by the other six staging systems, the NSP score remained a significant OS predictor in each subgroup in the training (Supplementary Figure 3) and validation cohorts (Supplementary Figure 4). These results combined with the highest AUC value for the NSP score suggest NSP score is a better scoring system than the other six staging systems.

Furthermore, higher NSP score values were associated with an increased number of documented major adverse events within 90 days after HR in both cohorts (Table 4).

Table 4 Association of NSP score with adverse events after HR in the training and validation cohorts

Discussion

Currently, many staging systems have been developed to classify patients with HCC. However, none of these staging systems were specifically developed to predict outcomes for intermediate HCC patients who received HR. In addition, the indication of liver resection for intermediate-stage HCC remains controversial (Forner et al, 2014, 2015; Zhong et al, 2015).Therefore, in clinical practice, a novel prognostic staging system specific for patients with intermediate-stage HCC is the key to selecting the best candidates for HR. Our study is the first to report and validate a scoring system (NSP) that predicts survival in patients with intermediate-stage HCC treated with HR. A low NSP score (1 point) was chosen as a cut-off because it reliably identified patients who were good candidates for HR based on their favourable median OS. The NSP score increases the pool of ideal HR candidates with intermediate-stage HCC by 50% with favourable long-term survival.

Although our study confirmed that the other six staging systems can stratify patients treated with HR into distinct risk categories, the ability of these systems to predict survival was suboptimal. The reason for the interiority of these staging systems might be that they were originally developed from mixed populations with a smaller number of patients receiving surgery (2.8% in CLIP, 8.5% in Okuda, and 10.4% in CUPI). Different treatments have a large influence on prognosis that should be considered in these staging systems. Furthermore, a major issue is that TNM staging and the NDR scoring system does not include measurements of liver function. The greatest criticism of the CLIP and Okuda systems is that their criteria of tumour morphology are too broad, reducing their value to intermediate-stage HCC patients. Although the BCLC-B sub-classification was specifically developed for intermediate-stage HCC, previous studies (Ha et al, 2014; Weinmann et al, 2015) indicated that there were no significant differences survival among several adjacent stages.

This study has several limitations. First, our effort is obviously limited by the retrospective nature of the analysis. However, the analysis was conducted based on prospectively collected data. Furthermore, we confirmed the results derived from the initial cohort with the validation cohort to increase reliability. Second, the purpose of this study was to identify a suitable subset of patients with intermediate-stage HCC for HR. The results do not suggest that patients with an NSP score 1 should or should not be treated with other therapies, such as liver transplantation, radiofrequency ablation, or TACE. Third, with the majority of the patients having evidence of HBV infection, our data require validation from other study groups in which HCV infection or alcohol is the prevailing aetiology of chronic liver disease. Therefore, multicenter and prospective studies are needed to confirm our results.

In conclusion, we developed a preoperative, externally validated, simple objective prognostic score (NSP) to identify patients with intermediate-stage HCC who could benefit from aggressive surgical interventions. For intermediate-stage HCC, HR is recommended for patients with an NSP score 1. These findings should be validated in further prospective studies.