Abstract

The prognosis of patients with nonalcoholic fatty liver disease-related hepatocellular carcinoma (NAFLD-HCC) is intricately associated with various factors. We aimed to investigate the prognostic algorithm of NAFLD-HCC patients using a data-mining analysis. A total of 247 NAFLD-HCC patients diagnosed from 2000 to 2014 were registered from 17 medical institutions in Japan. Of these, 136 patients remained alive (Alive group) and 111 patients had died at the censor time point (Deceased group). The random forest analysis demonstrated that treatment for HCC and the serum albumin level were the first and second distinguishing factors between the Alive and Deceased groups. A decision-tree algorithm revealed that the best profile comprised treatment with hepatectomy or radiofrequency ablation and a serum albumin level ≥3.7 g/dL (Group 1). The second-best profile comprised treatment with hepatectomy or radiofrequency ablation and serum albumin levels <3.7 g/dL (Group 2). The 5-year overall survival rate was significantly higher in the Group 1 than in the Group 2. Thus, we demonstrated that curative treatment for HCC and serum albumin level >3.7 g/dL was the best prognostic profile for NAFLD-HCC patients. This novel prognostic algorithm for patients with NAFLD-HCC could be used for clinical management.

Introduction

Hepatocellular carcinoma (HCC) is the third-most common cause of cancer-related death worldwide1. Many etiologies of HCC have been identified, including hepatitis B virus (HBV) infection, hepatitis C virus (HCV) infection, and excess alcohol intake. Recently, however, non-alcoholic fatty liver disease (NAFLD), which affects increasing numbers of patients in both Western countries and Asia, has become an exceptionally common risk factor for HCC2,3. We previously demonstrated that patients with NAFLD-related HCC (NAFLD-HCC) and those with alcoholic liver disease-related HCC had similarly poor prognoses, although the prevalence of liver cirrhosis is significantly lower among the NAFLD-HCC group4.

The prognosis of patients with HCC is influenced by various tumor-, host-, and treatment-related factors. For example, the tumor stage at diagnosis, vascular invasion, HCC recurrence, and distant metastasis are well established prognostic factors5,6,7. In addition, hepatic function, as assessed based on the serum albumin and bilirubin levels, and the presence of concomitant complications of obesity and diabetes influence the outcomes of patients with HCC8,9,10. Finally, therapies such as hepatic resection, radiofrequency ablation, transarterial chemoembolization, and sorafenib affect the prognosis of patients with HCC11,12,13. Although interactions among these factors influence prognosis, their relative contributions remain unclear.

A data mining analysis is a computer learning approach in which artificial intelligence is used to reveal factors and interactions between variables from large data sets, even if no a priori hypothesis has been imposed14. The benefits of this approach include the discovery of hidden factors/profiles and the provision of additional information that cannot be identified through a logistic regression analysis, and the results could be used to make stepwise decisions about disease management15. A random forest analysis is a data mining technique used to identify factors that distinguish between case and control groups. This type of analysis is associated with a high level of predictive accuracy and can be used to estimate the relative importance of each factor16. Additionally, decision tree analysis data mining techniques identify priorities used to reveal a series of classification rules17,18. This type of analysis classifies data sets of groups using profiles that comprise multiple factors. Recently, these data mining techniques have been used to investigate prognostic factors for pancreatic cancer19, breast cancer20, and leukemia21. To our knowledge, however, these newer statistical techniques have never been used to investigate the prognosis of patients with NAFLD-HCC.

The of this study was to investigate the factors associated with the prognosis of NAFLD-HCC patients using a random forest analysis. We additionally investigated profiles associated with prognosis using a decision tree analysis.

Results

Baseline characteristics and comparisons of the Alive and Deceased groups

The baseline patient characteristics and comparisons of the Alive and Deceased groups are summarized in Table 1. Patents in the Alive group were significantly younger than those in the Deceased group. The HCC size and number and serum AFP and DCP levels were significantly lower in the Alive group than in the Deceased group (Table 1). Furthermore, a significantly higher number of NAFLD-HCC patients were treated with hepatic resection in the Alive group, than that in the Deceased group. The serum albumin levels were significantly higher in the Alive group than in the Deceased group (Table 1); however, no significant difference was seen in HbA1c values, platelet counts, and serum levels of total bilirubin and total cholesterol between the two groups (Table 1). HCC is the main cause of death and liver-related death occupied 84.7% of all causes of death (Table 1).

Table 1 Enrolled Patient characteristics and comparison of the Alive and Deceased groups.

Overall analysis

A multivariate analysis showed that HCC treatment: others and BSC, age, and TNM stage III or IV were independent risk factors related to the prognosis of patients with NAFLD-HCC (Table 2). Meanwhile, the serum albumin level and body mass index (BMI) were found to be independent negative risk factors (Table 2).

Table 2 Cox regression model analysis of the prognosis of NAFLD-HCC patients.

A random forest analysis demonstrated that treatment for HCC, serum albumin level, and TNM stage were the first, second, and third distinguishing factors, respectively, between the Alive and Deceased groups (Fig. 1A).

Figure 1
Figure 1

Factors/profiles associated with prognosis in the overall cohort of patients with NAFLD-HCC. (A) Random forest analysis. Data are expressed as variable importance. (B) Decision-tree algorithm. Patients with NAFLD-HCC are classified according to the indicated cut-off value of each factor. The pie graphs indicate the proportions of alive (white) and deceased patients (black). (C) Kaplan–Meier analysis. Abbreviations: NAFLD, non-alcoholic fatty liver disease; TNM, tumor-node-metastasis; AFP, alpha-fetoprotein; DCP, des-γ-carboxy prothrombin; ALP, alkaline phosphatase; GGT, gamma-glutamyl transpeptidase; AST, aspartate aminotransferase; HbA1c, hemoglobin A1c; ALT, alanine aminotransferase; BMI, body mass index; HCC, hepatocellular carcinoma; LDH, lactate dehydrogenase; HBc, hepatitis B core; BUN, blood urea nitrogen.

A decision-tree algorithm with 2 divergence variables was created to classify 4 profiles of patients (Fig. 1B). Treatment for HCC was the first variable in the initial classification. Among patients treated with hepatic resection or RFA, a serum albumin level ≥3.7 g/dL was the second-division variable in this classification. The serum albumin level was also the second-division variable among patients treated with TACE, other modalities, or BSC. As shown in Fig. 1B, the mortality rate of patients treated with hepatic resection or RFA and presenting with a serum albumin level ≥3.7 g/dL (Group 1) was 25.0% (22/88). By contrast, the mortality rate of patients treated with TACE, other modalities, or BSC and presenting with serum albumin levels <3.8 g/dL (Group 4) was 75.7% (53/70).

A Kaplan–Meier analysis yielded respective 1-, 3-, and 5-year survival rates of 100%, 92.3%, and 75.6% in Group 1 and 46.5%, 22.0%, and 9.0% in Group 4. Significant differences in overall survival were observed between Groups 1 and 4 (HR = 9.98, 95% CI: 5.76–17.29, P < 0.0001) (Fig. 1C).

Stratification analysis according to TNM stage of HCC

A stratification analysis was performed according to the TNM stage of HCC. In each stage, the prognostic factors and profiles were analyzed using exploratory analyses including random forest analysis and decision tree analysis. NAFLD-HCC patients were classified into the group according to the results of the decision tree analysis and differences in survival rate among groups were analyzed by Kaplan–Meier analysis.

TNM stage I

A multivariate analysis identified the prothrombin activity and serum AST levels as independent prognostic factors for patients with TNM stage 1 NAFLD-HCC (Table 3). Here, a random forest analysis demonstrated that the treatment of HCC, age, and serum total cholesterol level were the first, second, and third distinguishing factors between the Alive and Deceased groups (Fig. 2A). Next, a decision-tree algorithm was created using only the total cholesterol level (Fig. 2B). Among patients with a total cholesterol level ≥182 mg/dL (Group sI-1), the mortality rate was 13% (2/17). By contrast, the mortality rate among patients with a total cholesterol level <182 mg/dL (Group sI-2) was 48% (11/23). A Kaplan–Meier analysis yielded respective 1-, 3-, and 5-year survival rates of 100.0%, 93.3%, and 93.3% in Group sI-1 and 86.7%, 86.7%, and 52.6% in Group sI-2. Significant differences in survival were observed between Groups 1 and 2 (HR = 13.66, 95% CI: 1.71–109.26, P = 0.0018) (Fig. 2C).

Table 3 Stratification analysis of the prognosis of NAFLD-HCC patients according to TNM stage.
Figure 2
Figure 2

Factors/profiles associated with the prognosis of TNM stage I HCC patients with NAFLD. (A) Random forest analysis. Data are expressed as variable importance. (B) Decision-tree algorithm. Patients with NAFLD-HCC are classified according to the indicated cut-off value for each factor. The pie graphs indicate the proportions of alive (white) and deceased patients (black). (C) Kaplan–Meier analysis. Abbreviations: NAFLD, non-alcoholic fatty liver disease; DCP, des-γ-carboxy prothrombin; ALT, alanine aminotransferase; HCC, hepatocellular carcinoma; HBc, hepatitis B core; LDH, lactate dehydrogenase; AFP, alpha-fetoprotein; AST, aspartate aminotransferase; BMI, body mass index; GGT, gamma-glutamyl transpeptidase; HbA1c, hemoglobin A1c; ALP, alkaline phosphatase; BUN, blood urea nitrogen.

TNM stage II

A multivariate analysis identified the serum albumin level as an independent negative risk factor and age as an independent risk factor among patients with TNM stage II NAFLD-HCC (Table 3). Here, the serum albumin level remained a first distinguishing factor between the Alive and Deceased groups in a random forest analysis (Fig. 3A). A decision-tree algorithm based only on the serum albumin level was created and used to classify 2 groups of patients (Fig. 3B). Accordingly, the mortality rate among patients with a serum albumin level ≥3.6 g/dL (Group sII-1) was 35% (24/68). By contrast, the mortality rate among those with a serum albumin level <3.6 g/dL (Group sII-2) was 61% (22/36). A Kaplan–Meier analysis yielded respective 1-, 3-, and 5-year survival rates of 98.5%, 87.4%, and 69.0% in Group sII-1 and 79.0%, 44.1%, and 23.1% in Group sII-2, respectively. These differences in survival between Group 1 and 2 were significant (HR = 4.42, 95% CI: 2.36–8.29, P < 0.0001) (Fig. 3C).

Figure 3
Figure 3

Factors/profiles associated with the prognosis of TNM stage II HCC patients with NAFLD. (A) Random forest analysis. Data are expressed as variable importance. (B) Decision-tree algorithm. Patients with NAFLD-HCC are classified according to the indicated cut-off value for each factor. The pie graphs indicate the proportions of alive (white) and deceased patients (black). (C) Kaplan–Meier analysis. Abbreviations: NAFLD, non-alcoholic fatty liver disease; AFP, alpha-fetoprotein; ALP, alkaline phosphatase; HbA1c, hemoglobin A1c; HCC, hepatocellular carcinoma; HBc, hepatitis B core; GGT, gamma-glutamyl transpeptidase; BUN, blood urea nitrogen; ALT, alanine aminotransferase; AST, aspartate aminotransferase; LDH, lactate dehydrogenase; DCP, des-γ-carboxy prothrombin; BMI, body mass index.

We also performed a propensity score matching analysis to reduce selection bias and confounding factors by calculating the propensity score consisted of age, sex, BMI, HCC treatment, platelet count, total bilirubin level, and presence of diabetes mellitus and hypertension (Supplementary Table 1). After the propensity score matching, a Kaplan–Meier analysis yielded respective 1-, 3-, and 5-year survival rates of 87.5%, 50.0%, and 37.5% in Group sII-1 and 66.7%, 13.3%, and 0.0% in Group sII-2, respectively. The difference in survival between Group sII-1 and Group sII-2 was significant (HR = 6.00, 95% CI: 4.50–8.11, P < 0.0001) (Supplementary Figure 1).

TNM stage III

A multivariate analysis identified the serum albumin level and BMI as independent negative risk factors among patients with TNM stage III NAFLD-HCC (Table 3). A random forest analysis identified the serum albumin level as the first distinguishing factor between the Alive and Deceased groups (Fig. 4A). A decision-tree algorithm was created with 3 divergence variables and used to classify 4 patient profiles (Fig. 4B). Here, DCP was used as the first variable in the initial classification. Among patients with a DCP level >32 mAU/L, the second variable was the serum albumin level. Among patients with a serum albumin level >3.5 g/dL, the third variable was the serum bilirubin level. Here, all patients with a DCP level <32 mAU/mL (Group sIII-1, 12/12) remained alive. By contrast, the mortality rate among patients with a DCP level >32 mAU/mL and a serum albumin <3.5 g/dL (Group sIII-4) was 78.9% (15/19). According to Kaplan–Meier analysis, the respective 1- and 3-year survival rates were 100% and 100% in Group sIII-1 and 36.8% and 13.1% in Group sIII-4. Significant differences in survival were observed between Groups 1 and 4 (HR = 2.7e+09, 95% CI: 0.0e+00–Infinity, P = 5.2e−06) (Fig. 4C).

Figure 4
Figure 4

Factors/profiles associated with the prognosis of TNM stage III HCC patients with NAFLD. (A) Random forest analysis. Data are expressed as variable importance. (B) Decision-tree algorithm. Patients with NAFLD-HCC are classified according to the indicated cut-off value for each factor. The pie graphs indicate the proportions of alive (white) and deceased patients (black). (C) Kaplan–Meier analysis. Abbreviations: NAFLD, non-alcoholic fatty liver disease; DCP, des-γ-carboxy prothrombin; ALT, alanine aminotransferase; AFP, alpha-fetoprotein; ALP, alkaline phosphatase; GGT, gamma-glutamyl transpeptidase; BUN, blood urea nitrogen; AST, aspartate aminotransferase; LDH, lactate dehydrogenase; HBc, hepatitis B core; HCC, hepatocellular carcinoma; BMI, body mass index; HbA1c, hemoglobin A1c.

TNM stage IV

A multivariate analysis identified the serum levels of DCP, creatinine, and LDH and positivity for the HBc antibody as independent prognostic factors among patients with TNM stage IV NAFLD-HCC (Table 3). The serum albumin level and BMI were identified as independent negative risk factors (Table 3). A random forest analysis identified the serum DCP, AST, and albumin levels as the first, second, and third distinguishing factors between the Alive and Deceased groups (Fig. 5A). A decision-tree algorithm was created based only on the serum albumin level and was used to classify 2 groups of patients (Fig. 5B). Although the mortality rate of patients with serum albumin levels of ≥4 g/dL (Group sIV-1) was 69% (9/13), this rate increased to 95% (21/22) among those with serum albumin levels <4 g/dL (Group sIV-2). A Kaplan–Meier analysis yielded respective 1-, 3- and 5-year survival rates of 69.2%, 44.9%, and 33.7% in Group sIV-1 and 30.0%,10.0%, and 5% in Group sIV-2. Significant differences in survival were observed between these groups (HR = 3.68, 95% CI: 1.58–8.57, P = 0.0025) (Fig. 5C).

Figure 5
Figure 5

Factors/profiles associated with the prognosis of TNM stage IV HCC patients with NAFLD. (A) Random forest analysis. Data are expressed as variable importance. (B) Decision-tree algorithm. Patients with NAFLD-HCC are classified according to the indicated cut-off value for each factor. The pie graphs indicate the proportions of alive (white) and deceased patients (black). (C) Kaplan–Meier analysis. Abbreviations: NAFLD, non-alcoholic fatty liver disease; DCP, des-γ-carboxy prothrombin; AST, aspartate aminotransferase; LDH, lactate dehydrogenase; AFP, alpha-fetoprotein; GGT, gamma-glutamyl transpeptidase; HCC, hepatocellular carcinoma; HbA1c, hemoglobin A1c; HBc, hepatitis B core; BUN, blood urea nitrogen; ALP, alkaline phosphatase; BMI, body mass index; ALT, alanine aminotransferase.

Discussion

We first applied an artificial intelligence-based approach to one of the largest NAFLD-HCC data sets to investigate the prognostic factors/profiles relevant to patients. Our study used a random forest analysis to demonstrate that treatment for HCC, the serum albumin level, and the TNM stage were significant prognostic factors among patients with NAFLD-HCC. A decision tree analysis revealed that a patient profile comprising curative treatment for HCC and a serum albumin level >3.7 g/dL was associated with a better prognosis. Moreover, both random forest analyses and data mining analyses stratified by TNM stage revealed that the serum albumin level was a prognostic factor for patients with stage II–IV NAFLD-HCC.

Although the benefits of data mining analysis include the discovery of hidden factors/profiles with high predictive accuracy, one obstacle to this type of approach is the requirement for a large data set; therefore, we used the large data sets from JSG-NAFLD (n = 247). The clinical features of NAFLD-HCC in this study were similar to those in a previous report of another large data set study from the HCC-NAFLD Italian Study Group (n = 145)22. In addition, more than 95% of enrolled patients in our study had data for all variables, including AFP and DCP, thus confirming the reliability of our data sets. Moreover, none of the NAFLD-HCC patients enrolled in this study had undergone liver transplantation for reasons including advanced HCC, lack of a donor, age, or religious objections, which allowed us to discern the natural history of NAFLD-HCC.

Most HCCs arise in the context of chronic liver diseases with various etiologies, including chronic HBV/HCV infection, alcohol consumption, and NAFLD. For patients with HBV-related HCC, nucleotide analog therapy is known to improve prognosis after curative cancer treatment23. Similarly, for patients with HCV-related HCC, interferon-based treatment may improve prognosis by ameliorating the liver reserve of infection after curative treatment for HCC24. Therefore, treatment for the underlying liver disease or dysfunction, in addition to curative treatment of the primary tumor, can improve patient outcomes. However, little is known about the prognostic profiles of patients with NAFLD-HCC. In this study, we first applied data mining techniques and identified better prognoses with a profile comprising curative treatment for HCC and a serum albumin level >3.7 g/dL. Although obesity and type 2 diabetes mellitus have been identified as potent risk factors for HCC in patients with NAFLD25,26, our algorithm is specific for NAFLD patients, which suggest that the liver reserve is a more important prognostic risk factor than obesity or type 2 diabetes mellitus.

The tumor stage is widely considered an absolute categorical factor for survival in patients with primary liver tumors. Although various tumor staging systems have been used, the TNM system is reported to predict the prognoses of patients with both advanced and early tumors27. Therefore, we performed both random forest and decision tree analyses stratified by TNM stage and again found that the serum albumin level influenced prognosis, particularly among those with TNM stage II–IV disease. Recently, the albumin-bilirubin grade, an index of the functional liver reserve, was shown to predict prognosis across all stages of HCC in a study wherein 93% of patients had virus-related cancers28. The present results are consistent with those of the earlier study, and the liver functional reserve seems to be a universal prognostic factor for most HCC patients, regardless of the chronic liver disease etiology.

In our study, serum albumin level was a prognostic factor for patients with NAFLD-HCC, indicating that hepatic fibrosis is the prognostic factor. In addition, our findings suggested that serum albumin level had higher impact on the prognosis than other hepatic parameters including platelet count, prothrombin activity, total cholesterol, and bilirubin in both the random forest and decision-tree analyses. We also performed a propensity score matching. Even after the propensity score matching, the survival rate of patients with a serum albumin level ≥3.6 g/dL was significantly higher than patients with a serum albumin level <3.6 g/dL. These findings also suggest that serum albumin has unique implication other than a hepatic fibrosis-related factor. The decreased albumin may be caused by low intake of protein and/or an oxidative stress-induced degradation of albumin29. Serum albumin exerts anti-oxidative activity by harboring a disulfide-bonded cysteine at the thiol of Cys34 and the oxidized albumin is degraded by endogenous proteases29. Albumin is also known to bind with cisplatin at the III domain to enhance the anti-tumor activity of this drug12. In fact, the baseline serum albumin level is a prognostic factor in patients with various malignancies, including those of the colon, lung, and breast cancer30,31,32. Moreover, Nojiri et al. reported that albumin suppresses the proliferation of HCC cell lines by upregulating the expression of p21 and p57 and consequently increasing the G0/G1 cell population33. Thus, serum albumin level may reflect degree of oxidative stress and anti-tumor activity in patients with NAFLD.

A limitation of this study is the reliability of this algorithm. Since we did not validate the algorithm, further prospective study is required to test the reliability of this algorithm. We also must be cautious in the interpretation for the results the Cox regression model analysis. In this study, we proposed a novel prognostic algorithm based on treatment for HCC and the serum albumin level. In addtion, age, BMI, and TNM stage were identified as independent prognostic factors in the Cox regression model analysis. Thus, these independent factors should also be paid attention for the management of patients with NAFLD-HCC.

In conclusion, this nationwide data mining analysis-based study identified treatment for HCC, the serum albumin level, and the TNM stage as significant long-term prognostic factors among patients with NAFLD-HCC. We identified a profile comprising curative treatment for HCC and a serum albumin level >3.7 g/dL as predictive of a better prognosis. Furthermore, we identified the serum albumin level as a prognostic factor for patients with stage II–IV HCC. These findings suggest that this novel prognostic algorithm could be used for the clinical management of patients with NAFLD-HCC.

Subjects and Methods

Study design and ethics

This retrospective study was designed in 2015 by the steering committee of the Japan Study Group of NAFLD (JSG-NAFLD) as a multicenter investigation of the prognosis of patients with NAFLD-HCC. This protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki, as reflected by the prior approval of the institutional review board of Kurume University School of Medicine, Tokyo Women’s Medical University, JA Hiroshima General Hospital, Hiroshima University, Sapporo Kosei General Hospital, Kochi Medical School, Kawasaki Medical School, Asahikawa Medical University, Nayoro City General Hospital, Yokohama City University School of Medicine, Oita University, Saga University, Nara City Hospital, Kyoto Prefectural University of Medicine, Aichi Medical University, National Center for Global Health and Medicine, Osaka University, Osaka City University, and Osaka City Juso Hospital. All experiments were performed in accordance with relevant guidelines and regulations. An opt-out approach was used to obtain informed consent from the patients, and personal information was protected during data collection.

Subjects

A total of 247 consecutive patients diagnosed with NAFLD-HCC between 2000 and 2014 were registered from 17 medical institutions in Japan. Of these, 136 patients remained alive (Alive group) and 111 patients had died (Deceased group) at the censor time of this study (December 2014).

Diagnosis of NAFLD and HCC

NAFLD-HCC was diagnosed according to the Clinical Practice Guidelines for NAFLD/nonalcoholic steatohepatitis (NASH) as follows34: (1) hepatic steatosis evaluated by liver biopsy, ultrasonography, computed tomography, or magnetic resonance imaging; (2) ethanol intake <20 g/day in women or <30 g/day in men; and (3) exclusion of other liver diseases, including HBV, HCV, autoimmune hepatitis, drug-induced liver disease, primary biliary cholangitis, primary sclerosing cholangitis, biliary obstruction, Wilson’s disease, and hemochromatosis.

HCC was diagnosed via histological examination or a combination of serum tumor makers such as α-fetoprotein (AFP) and des-γ-carboxy prothrombin (DCP), as well as imaging modalities such as ultrasonography, computed tomography, magnetic resonance imaging, and/or angiography according to the Japanese Clinical Practice guidelines for HCC: The Japan Society of Hepatology35.

Inclusion and exclusion criteria

The following patient inclusion criteria were used: (1) NAFLD-HCC, (2) age >18 years, (3) no previous treatment for HCC, and (4) complete follow-up from the initial treatment for HCC until death or the study censor time (December 2014). The exclusion criteria were as follows: (1) a history of a malignant tumor other than HCC within the 5 years preceding the study and (2) participation in any drug trial.

Data collection

Variables related to host, tumor, and treatment factors were retrospectively reviewed using clinical records. The following data were collected at the time of diagnosis of HCC: host factors, including age, sex, body mass index (BMI), smoking (pack-year), hemoglobin level, platelet count, fasting blood glucose level, hemoglobin A1c (HbA1c) level, prothrombin activity, and serum levels of aspartate aminotransferase (AST), alanine aminotransferase (ALT), lactate dehydrogenase (LDH), gamma-glutamyl transpeptidase (γ-GTP), alkaline phosphatase (ALP), albumin, total bilirubin, total cholesterol, high density lipoprotein-cholesterol, low density lipoprotein-cholesterol, triglyceride, blood urea nitrogen (BUN), creatinine, and hepatitis B core (HBc) antibody; tumor factors, including the size and number of HCC, serum levels of AFP and DCP, gross classification of HCC, and clinical staging (tumor-node-metastasis [TNM] classification) based on the criteria of the Liver Cancer Study Group of Japan36 (stage I, n = 40; stage II, n = 104; stage III, n = 66; stage IV, n = 35; lack of sufficient data for staging; n = 2); and treatment factors such as the selected treatment modality [hepatic resection, radiofrequency ablation (RFA), transarterial chemoembolization (TACE), others (sorafenib, radiotherapy, and hepatic arterial infusion chemotherapy), best supportive care (BSC)]. Treatments were selected according to the HCC guidelines of the Japan Society of Hepatology37.

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Definition of event and follow-up

In this study, an event was defined as death from any cause. After the initial treatment for HCC, patients were followed up until death or the study censor date through routine physical examinations, biochemical tests (including serum AFP and DCP levels), and abdominal imaging (including ultrasonography, computed tomography, or magnetic resonance imaging) according to the HCC guidelines of the Japan Society of Hepatology37. HCC patients treated with BSC were also followed up.

Statistics

Data are expressed as numbers or means ± standard deviations. Differences between the two groups were analyzed using the Mann–Whitney U test. Factors or profiles associated with the prognosis of NAFLD-HCC patients were analyzed using data mining techniques. All statistical analyses were conducted by a biostatistician (AK). The statistical methods are described in detail below.

Multivariate stepwise analysis

A Cox regression model was used to identify independent variables associated with the prognosis of NAFLD-HCC in a multivariate analysis. Based on our purpose, we didn’t conduct the univariate analysis. Explanatory variables were selected from variables listed in Table 1 by the stepwise manner minimizing the Bayesian information criterion as previously described15. Data were expressed as hazard ratios (HR) and 95% confidence intervals (CI).

Random forest analysis

A random forest analysis was used to identify factors that distinguished between the Alive and Deceased groups on an ordinal scale, as previously described15. The variable importance (VI) value, which reflects the relative contribution of each variable to the model, was estimated by randomly permuting its values and recalculating the predictive accuracy of the model.

Decision tree algorithm

A decision-tree algorithm was constructed to reveal profiles associated with the prognosis of NAFLD-HCC according to the instructions provided with the R software package (http://www.R-project.org/)38.

Kaplan–Meier analysis

NAFLD-HCC patients were classified into the correspond group of the decision-tree algorithm. The overall survival of each group was estimated using the Kaplan–Meier method, and differences in survival between the groups were analyzed using the log-rank test.

All P values were 2-tailed, and a value <0.05 was considered statistically significant. The multivariate stepwise analysis, random forest analysis, decision tree analysis, and Kaplan–Meier analysis were performed using the R software package38.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Bertuccio, P. et al. Global Trends and Predictions in Hepatocellular Carcinoma Mortality. J Hepatol, in press, https://doi.org/10.1016/j.jhep.2017.03.011 (2017).

  2. 2.

    White, D. L., Thrift, A. P., Kanwal, F., Davila, J. & El-Serag, H. B. Incidence of Hepatocellular Carcinoma in All 50 United States, From 2000 Through 2012. Gastroenterology 152, 812–820 e815, https://doi.org/10.1053/j.gastro.2016.11.020 (2017).

  3. 3.

    Tateishi, R. et al. Clinical characteristics, treatment, and prognosis of non-B, non-C hepatocellular carcinoma: a large retrospective multicenter cohort study. J Gastroenterol 50, 350–360, https://doi.org/10.1007/s00535-014-0973-8 (2015).

  4. 4.

    Tokushige, K. et al. Hepatocellular carcinoma in Japanese patients with nonalcoholic fatty liver disease and alcoholic liver disease: multicenter survey. J Gastroenterol 51, 586–596, https://doi.org/10.1007/s00535-015-1129-1 (2016).

  5. 5.

    Llovet, J. M., Bru, C. & Bruix, J. Prognosis of hepatocellular carcinoma: the BCLC staging classification. Semin Liver Dis 19, 329–338, https://doi.org/10.1055/s-2007-1007122 (1999).

  6. 6.

    Han, K. H. et al. Asian consensus workshop report: expert consensus guideline for the management of intermediate and advanced hepatocellular carcinoma in Asia. Oncology 81(Suppl 1), 158–164, https://doi.org/10.1159/000333280 (2011).

  7. 7.

    Liu, P. H. et al. Prognosis of hepatocellular carcinoma: Assessment of eleven staging systems. J Hepatol 64, 601–608, https://doi.org/10.1016/j.jhep.2015.10.029 (2016).

  8. 8.

    Wang, Q. et al. Impact of liver fibrosis on prognosis following liver resection for hepatitis B-associated hepatocellular carcinoma. Br J Cancer 109, 573–581, https://doi.org/10.1038/bjc.2013.352 (2013).

  9. 9.

    Knox, J. J. Addressing the interplay of liver disease and hepatocellular carcinoma on patient survival: the ALBI scoring model. J Clin Oncol 33, 529–531, https://doi.org/10.1200/JCO.2014.59.0521 (2015).

  10. 10.

    Raffetti, E. et al. Role of aetiology, diabetes, tobacco smoking and hypertension in hepatocellular carcinoma survival. Dig Liver Dis 47, 950–956, https://doi.org/10.1016/j.dld.2015.07.010 (2015).

  11. 11.

    Lin, S. M., Lin, C. J., Lin, C. C., Hsu, C. W. & Chen, Y. C. Radiofrequency ablation improves prognosis compared with ethanol injection for hepatocellular carcinoma < or =4 cm. Gastroenterology 127, 1714–1723 (2004).

  12. 12.

    Yamakado, K. et al. Hepatic arterial embolization for unresectable hepatocellular carcinomas: do technical factors affect prognosis? Jpn J Radiol 30, 560–566, https://doi.org/10.1007/s11604-012-0088-1 (2012).

  13. 13.

    Llovet, J. M. et al. Sorafenib in advanced hepatocellular carcinoma. N Engl J Med 359, 378–390, https://doi.org/10.1056/NEJMoa0708857 (2008).

  14. 14.

    Bellazzi, R. & Zupan, B. Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77, 81–97, https://doi.org/10.1016/j.ijmedinf.2006.11.006 (2008).

  15. 15.

    Yamada, S. et al. Serum albumin level is a notable profiling factor for non-B, non-C hepatitis virus-related hepatocellular carcinoma: A data-mining analysis. Hepatol Res 44, 837–845, https://doi.org/10.1111/hepr.12192 (2014).

  16. 16.

    Touw, W. G. et al. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 14, 315–326, https://doi.org/10.1093/bib/bbs034 (2013).

  17. 17.

    Kurosaki, M. et al. A predictive model of response to peginterferon ribavirin in chronic hepatitis C using classification and regression tree analysis. Hepatol Res 40, 251–260, https://doi.org/10.1111/j.1872-034X.2009.00607.x (2010).

  18. 18.

    Pauker, S. G. & Kassirer, J. P. The threshold approach to clinical decision making. N Engl J Med 302, 1109–1117 (1980).

  19. 19.

    Diouf, M. et al. Prognostic value of health-related quality of life in patients with metastatic pancreatic adenocarcinoma: a random forest methodology. Qual Life Res 25, 1713–1723, https://doi.org/10.1007/s11136-015-1198-x (2016).

  20. 20.

    Chao, C. M., Yu, Y. W., Cheng, B. W. & Kuo, Y. L. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree. J Med Syst 38, 106, https://doi.org/10.1007/s10916-014-0106-1 (2014).

  21. 21.

    Masic, N. et al. Decision-tree approach to the immunophenotype-based prognosis of the B-cell chronic lymphocytic leukemia. Am J Hematol 59, 143–148 (1998).

  22. 22.

    Piscaglia, F. et al. Clinical patterns of hepatocellular carcinoma in nonalcoholic fatty liver disease: A multicenter prospective study. Hepatology 63, 827–838, https://doi.org/10.1002/hep.28368 (2016).

  23. 23.

    Liu, G. M., Huang, X. Y., Shen, S. L., Hu, W. J. & Peng, B. G. Adjuvant antiviral therapy for hepatitis B virus-related hepatocellular carcinoma after curative treatment: A systematic review and meta-analysis. Hepatol Res 46, 100–110, https://doi.org/10.1111/hepr.12584 (2016).

  24. 24.

    Hsu, C. S., Chao, Y. C., Lin, H. H., Chen, D. S. & Kao, J. H. Systematic Review: Impact of Interferon-based Therapy on HCV-related Hepatocellular Carcinoma. Sci Rep 5, 9954, https://doi.org/10.1038/srep09954 (2015).

  25. 25.

    Dyson, J. et al. Hepatocellular cancer: the impact of obesity, type 2 diabetes and a multidisciplinary team. J Hepatol 60, 110–117, https://doi.org/10.1016/j.jhep.2013.08.011 (2014).

  26. 26.

    Kawamura, Y. et al. Large-scale long-term follow-up study of Japanese patients with non-alcoholic Fatty liver disease for the onset of hepatocellular carcinoma. Am J Gastroenterol 107, 253–261, https://doi.org/10.1038/ajg.2011.327 (2012).

  27. 27.

    Kee, K. M. et al. Validation of the7th edition TNM staging system for hepatocellular carcinoma: an analysis of 8,828 patients in a single medical center. Dig Dis Sci 58, 2721–2728, https://doi.org/10.1007/s10620-013-2716-8 (2013).

  28. 28.

    Pinato, D. J. et al. The ALBI grade provides objective hepatic reserve estimation across each BCLC stage of hepatocellular carcinoma. J Hepatol 66, 338–346, https://doi.org/10.1016/j.jhep.2016.09.008 (2017).

  29. 29.

    Kawakami, A. et al. Identification and characterization of oxidized human serum albumin. A slight structural change impairs its ligand-binding and antioxidant functions. FEBS J 273, 3346–3357, https://doi.org/10.1111/j.1742-4658.2006.05341.x (2006).

  30. 30.

    Gonzalez-Trejo, S. et al. Baseline serum albumin and other common clinical markers are prognostic factors in colorectal carcinoma: A retrospective cohort study. Medicine (Baltimore) 96, e6610, https://doi.org/10.1097/MD.0000000000006610 (2017).

  31. 31.

    Fiala, O. et al. Serum albumin is a strong predictor of survival in patients with advanced-stage non-small cell lung cancer treated with erlotinib. Neoplasma 63, 471–476, https://doi.org/10.4149/318_151001N512 (2016).

  32. 32.

    Lis, C. G., Grutsch, J. F., Vashi, P. G. & Lammersfeld, C. A. Is serum albumin an independent predictor of survival in patients with breast cancer? JPEN J Parenter Enteral Nutr 27, 10–15, https://doi.org/10.1177/014860710302700110 (2003).

  33. 33.

    Nojiri, S. & Joh, T. Albumin suppresses human hepatocellular carcinoma proliferation and the cell cycle. Int J Mol Sci 15, 5163–5174, https://doi.org/10.3390/ijms15035163 (2014).

  34. 34.

    Watanabe, S. et al. Evidence-based clinical practice guidelines for nonalcoholic fatty liver disease/nonalcoholic steatohepatitis. Hepatol Res 45, 363–377, https://doi.org/10.1111/hepr.12511 (2015).

  35. 35.

    Kokudo, N. et al. Evidence-based Clinical Practice Guidelines for Hepatocellular Carcinoma: The Japan Society of Hepatology 2013 update (3rd JSH-HCC Guidelines). Hepatol. Res. 45, https://doi.org/10.1111/hepr.12464 (2015).

  36. 36.

    Minagawa, M., Ikai, I., Matsuyama, Y., Yamaoka, Y. & Makuuchi, M. Staging of hepatocellular carcinoma: assessment of the Japanese TNM and AJCC/UICC TNM systems in a cohort of 13,772 patients in Japan. Ann Surg 245, 909–922, https://doi.org/10.1097/01.sla.0000254368.65878.da (2007).

  37. 37.

    Arii, S. et al. Management of hepatocellular carcinoma: Report of Consensus Meeting in the 45th Annual Meeting of the Japan Society of Hepatology (2009). Hepatol Res 40, 667–685, https://doi.org/10.1111/j.1872-034X.2010.00673.x (2010).

  38. 38.

    R Development Core Team. R: A language and enviroment for statistical computing. (R Foundation for Statistical Computing, 2012).

Download references

Acknowledgements

The authors thank Drs Yu Noda (Kurume University School of Medicine), Masahito Nakano (Kurume University School of Medicine), Takashi Niizeki (Kurume University School of Medicine), Kazuhisa Kodama (Tokyo Women’s Medical University), Tomomi Kogiso (Tokyo Women’s Medical University), Kensuke Munekage (Kochi Medical School), Kayo Endo (Nara City Hospital), Tasuku Hara (Kyoto Prefectural University of Medicine), Naohiko Masaki (National Center for Global Health and Medicine), Shintaro Mikami (National Center for Global Health and Medicine), Masatoshi Imamura (National Center for Global Health and Medicine), Yasushi Kojima (National Center for Global Health and Medicine), Satoshi Oeda (Saga University) for providing clinical data.

Author information

Affiliations

  1. Department of Medicine, Kurume University School of Medicine, Kurume, Japan

    • Takumi Kawaguchi
    •  & Takuji Torimura
  2. Department of Internal Medicine and Gastroenterology, Tokyo Women’s Medical University, Tokyo, Japan

    • Katsutoshi Tokushige
    •  & Etsuko Hashimoto
  3. Department of Gastroenterology and Hepatology, JA Hiroshima General Hospital, Hatsukaichi, Japan

    • Hideyuki Hyogo
  4. Department of Gastroenterology and Metabolism, Applied Life Science, Institute of Biomedical and Health Sciences, Hiroshima University, Hiroshima, Japan

    • Hiroshi Aikata
    •  & Kazuaki Chayama
  5. Department of Hepatology, Sapporo Kosei General Hospital, Sapporo, Japan

    • Tomoaki Nakajima
    •  & Yoshiyasu Karino
  6. Department of Gastroenterology and Hepatology, Kochi Medical School, Nankoku, Japan

    • Masafumi Ono
    •  & Toshiji Saibara
  7. Department of General Internal Medicine2, General Medical Center, Kawasaki Medical School, Okayama, Japan

    • Miwa Kawanaka
  8. Division of Gastroenterology and Hematology/Oncology, Department of Medicine, Asahikawa Medical University, Asahikawa, Japan

    • Koji Sawada
  9. Department of Gastroenterology, Nayoro City General Hospital, Nayoro, Japan

    • Yasuaki Suzuki
  10. Department of Gastroenterology and Hepatology, Yokohama City University School of Medicine, Yokohama, Japan

    • Kento Imajo
    •  & Masato Yoneda
  11. Department of Gastroenterology, Faculty of Medicine, Oita University, Yufu, Japan

    • Koichi Honda
    •  & Masataka Seike
  12. Internal Medicine, Saga University faculty of Medicine, Saga, Japan

    • Hirokazu Takahashi
  13. Liver Center, Saga University Hospital, Saga, Japan

    • Yuichiro Eguchi
  14. Center for Digestive and Liver Diseases, Nara City Hospital, Nara, Japan

    • Kohjiroh Mori
    •  & Saiyu Tanaka
  15. Department of Gastroenterology and Hepatology, Kyoto Prefectural University of Medicine, Kyoto, Japan

    • Yuya Seko
  16. Division of Hepatology and Pancreatology, Department of Internal Medicine, Aichi Medical University, Nagakute, Japan

    • Yoshio Sumida
  17. Department of Gastroenterology, National Center for Global Health and Medicine, Tokyo, Japan

    • Yuichi Nozaki
    •  & Mikio Yanase
  18. Department of Gastroenterology and Hepatology, Osaka University, Graduate School of Medicine, Suita, Japan

    • Yoshihiro Kamada
    •  & Tetsuo Takehara
  19. Department of Hepatology, Osaka City University, Graduate School of Medicine, Osaka, Japan

    • Hideki Fujii
  20. Department of Gastroenterology and Hepatology, Osaka City Juso Hospital, Osaka, Japan

    • Hideki Fujii
  21. Center for Comprehensive Community Medicine Faculty of Medicine, Saga University, Saga, Japan

    • Atsushi Kawaguchi
  22. Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Sydney, NSW, Australia

    • Jacob George

Authors

  1. Search for Takumi Kawaguchi in:

  2. Search for Katsutoshi Tokushige in:

  3. Search for Hideyuki Hyogo in:

  4. Search for Hiroshi Aikata in:

  5. Search for Tomoaki Nakajima in:

  6. Search for Masafumi Ono in:

  7. Search for Miwa Kawanaka in:

  8. Search for Koji Sawada in:

  9. Search for Kento Imajo in:

  10. Search for Koichi Honda in:

  11. Search for Hirokazu Takahashi in:

  12. Search for Kohjiroh Mori in:

  13. Search for Saiyu Tanaka in:

  14. Search for Yuya Seko in:

  15. Search for Yuichi Nozaki in:

  16. Search for Yoshihiro Kamada in:

  17. Search for Hideki Fujii in:

  18. Search for Atsushi Kawaguchi in:

  19. Search for Tetsuo Takehara in:

  20. Search for Mikio Yanase in:

  21. Search for Yoshio Sumida in:

  22. Search for Yuichiro Eguchi in:

  23. Search for Masataka Seike in:

  24. Search for Masato Yoneda in:

  25. Search for Yasuaki Suzuki in:

  26. Search for Toshiji Saibara in:

  27. Search for Yoshiyasu Karino in:

  28. Search for Kazuaki Chayama in:

  29. Search for Etsuko Hashimoto in:

  30. Search for Jacob George in:

  31. Search for Takuji Torimura in:

Contributions

T.K. designed the study and wrote the initial draft of the manuscript. A.K. contributed to analysis and interpretation of data and assisted in the preparation of the manuscript. K.T., H.H., H.A., T.N., M.O., M.K., K.S., K.I., K.H., H.T., K.M., S.T., Y.S., Y.N., Y.K. and H.F. have contributed to data collection and interpretation. T.T., M.Y., Y.S., Y.E., M.S., M.Y., Y.S., T.S., Y.K., K.C., E.H., J.G. and T.T. critically reviewed the manuscript.

Competing Interests

The authors declare no competing interests.

Corresponding author

Correspondence to Takumi Kawaguchi.

Electronic supplementary material

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41598-018-28650-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.