Construction and validation of a nomogram for predicting cancer-specific survival in hepatocellular carcinoma patients

The prognosis of patients with hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a research hotspot. This study aimed to incorporate important factors obtained from SEER database to construct and validate a nomogram for predicting the cancer-specific survival (CSS) of patients with HCC and ICC. We obtained patient data from SEER database. The nomogram was constructed base on six prognostic factors for predicting CSS rates in HCC patients. The nomogram was validated by concordance index (C-index), the receiver operating characteristic (ROC) curve and calibration curves. A total of 3227 patients diagnosed with HCC (3038) and ICC (189) between 2010 and 2015 were included in this study. The C-index of the nomogram for HCC patients was 0.790 in the training cohort and 0.806 in the validation cohort. The 3- and 5-year AUCs were 0.811 and 0.793 in the training cohort. The calibration plots indicated that there was good agreement between the actual observations and predictions. In conclusion, we constructed and validated a nomogram for predicting the 3- and 5-year CSS in HCC patients. We have confirmed the precise calibration and excellent discrimination power of our nomogram.

Primary liver cancer, the fourth leading cause of cancer-related mortality and the sixth most common cancer, is a global health issue 1,2 . Histologically, primary liver cancer includes three main subtypes: hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (ICC), and mixed hepatocellular cholangiocarcinoma 3 . HCC is the most frequent subtype of liver cancer, which accounts for more than 80% of all primary liver cancers 4,5 , in frequency only to HCC, ICC arises from the epithelial layer of the second-degree biliary tract and is highly malignant 6 . The prognosis of patients with HCC and ICC is a research hotspot because new and more effective treatment strategies need to be based on information regarding prognostic risks. Prognostic factors such as patient age, tumor grade, and the American Joint Committee on Cancer (AJCC) stage have been used to predict patients' survival time and response to treatment 7,8 . However, oncologists face challenges with the use of these unconsolidated factors; therefore, it is necessary to integrate multiple prognostic factors into an easy-to-use predictive system to better inform oncologists and more accurately stratify patients.
A nomogram is a predictive tool that creates a simple graph based on a predictive statistical model 9 . It can be used to calculate the probability of a clinical event by considering the prognostic weight of each factor. Nomograms have been widely used to assist clinical decision-making [10][11][12] . This study aimed to incorporate important factors obtained from the Surveillance, Epidemiology, and End Results (SEER) database to construct and validate a nomogram for predicting the cancer-specific survival (CSS) of patients with HCC and ICC.
Statistical analysis. For nomogram construction and validation, we randomly divided all the HCC patients into training (n = 2123) and validation (n = 915) cohorts, in a ratio of 7:3 13,14 . Multivariate Cox proportional hazards regression analysis was performed to identify variables (P < 0.05) that significantly affected CSS in the training group. Using these identified prognostic factors, we constructed a nomogram for predicting 3-and 5-year CSS rates in HCC and ICC patients 15 .
The nomogram was validated internally in the training cohort and externally in the validation cohort. To evaluate the discriminative ability of the nomogram, we used the concordance index (C-index) and the receiver operating characteristic (ROC) curve and assessed the area under the curve (AUC) 16,17 . A C-index or AUC of 0.5 indicates a discrimination ability that is no better than chance, whereas that of 1.0 indicates a perfect discrimination ability 18 . Calibration curves were constructed using a bootstrap approach, with 500 resamples, to compare the predicted CSS with the CSS observed in the study.
All statistical analyses were conducted using SPSS (version 24.0; SPSS, Chicago, IL, USA) and R software (version 3.6.1; http://www.r-proje ct.org/). A P value of less than 0.05 was considered to indicate statistical significance.

Results
Patient characteristics. A total of 3227 patients diagnosed with HCC (3038) and ICC (189) between 2010 and 2015 were included in this study. The training and validation cohorts of HCC patients consisted of 2123 and 915 cases, respectively, selected by the random split-sample method (split ratio: 7:3). In the total cohort of HCC patients, the majority of patients were under 65 years old (59.4%), white (66.5%), and male (76.0%). Furthermore, most of the patients had T1 (47.6%), N0 (95.7%), and M0 (93.5%); patients with a grade II tumor differentiation degree accounted for 50.5% of all cases. A large proportion of the patients had elevated AFP levels (64.9%) and cirrhosis (69.5%), and 46.6% of patients underwent tumor resection surgery. The characteristics of HCC patients in the training and validation cohorts were similar to those in the total cohort (Table 1).
For ICC patients, the training and validation cohorts consisted of 132 and 57 cases respectively. In the total cohort of ICC patients, most of characteristics (Age, Race, Sex, AJCC stage, Surgery and Grade) were similar to HCC patients. On the contrary, the majority of ICC patients had normal AFP levels (73.5%) and fibrosis level (66.7%). The characteristics of ICC patients in the training and validation cohorts were similar to those in the total cohort (Supplementary Table S1).

Nomogram construction.
Due to the small sample size (189) and few independent prognostic factors of ICC patients, the nomogram was only constructed for HCC patients. A nomogram based on the selected prognostic factors from the training cohort was developed for the prediction of HCC patient CSS at 3 and 5 years (Fig. 1). The nomogram demonstrated that surgery contributed the most to prognosis, followed by AJCC T, AJCC M, grade, fibrosis and AFP level. Each level of every variable was assigned a score on the points scale. The total score was obtained by adding the scores of each of the selected variables. The prediction corresponding to this total score then helped in estimating the 3-and 5-year CSS for each HCC patient. www.nature.com/scientificreports/ better than traditional AJCC stage in both the training and validation cohorts. The calibration plots for the 3and 5-year CSS indicated that there was good agreement between the actual observations and predictions made using the nomogram in both the training cohort (Fig. 3) and the validation cohort (Fig. 4).

Discussion
Determining accurate tumor prognosis after definitive treatment is important 19 . Although the AJCC staging system is widely used for predicting prognosis in primary liver cancer patients, it has inherent defects because it neglects many significant risk factors such as race, age, and grade. Nomograms have been shown to be more accurate and user-friendly than the conventional staging system in many cancers 20,21 . In this study, we constructed a more comprehensive model based on a combination of various risk factors to better predict prognosis in HCC patients.
To identify independent prognostic factors, we performed univariate and multivariate Cox proportional hazards regression analysis. Univariate analysis indicated that race, sex, AJCC T, AJCC N, AJCC M, surgery, grade, AFP level, and fibrosis were potential prognostic factors for HCC patients. After multivariate analysis, only AJCC T, AJCC M, surgery, grade, AFP and fibrosis were independent prognostic factors for HCC patients and the nomogram was constructed based on those six factors. Our results showed that HCC patients who underwent liver transplantation had a better prognosis than those who underwent tumor resection. This is a useful conclusion for both doctors and patients. www.nature.com/scientificreports/ For ICC patients, only AJCC T, AJCC M, and surgery were potential prognostic factors after univariate analysis and multivariate analysis. These factors were quite similar to traditional AJCC staging system and the sample size (189) of ICC patients was relatively small. Therefore it was not meaningful to construct a nomogram based on these three factors for ICC patients. A large sample size study including more risk factors in the future would be needed.
It is widely known that a model has relatively good discrimination if its C-index and AUC exceed 0.7 22 ; therefore, our model has a good discrimination ability. Furthermore, the calibration plot indicated that the CSS probabilities predicted by our nomogram were identical to the actual ones. The validation results indicate that our nomogram could be effective for application in the clinical setting.
Our study has some limitations. First, this large-sample retrospective study was based on the SEER database, which may have some inherent biases. Second, data regarding several potential important prognosis-related factors such as HBsAg, AST, CEA, and vascular invasion were not available in the SEER database. Third, due to the small sample size (189) and few independent prognostic factors of ICC patients, the nomogram was only constructed for HCC patients. Finally, our nomogram was internally validated, and it needs to be validated externally using other populations. www.nature.com/scientificreports/  www.nature.com/scientificreports/

Conclusion
We constructed and validated a nomogram for predicting the 3-and 5-year CSS in HCC patients. The proposed nomogram considered six independent risk factors: AJCC T, AJCC M, surgery, tumor grade, AFP level and fibrosis. We have confirmed the precise calibration and excellent discrimination power of our nomogram. The predictive power of this nomogram may be improved by considering other potential important factors that we could not be obtained from the SEER database, and also by external validation.