Development of machine learning-based clinical decision support system for hepatocellular carcinoma

There is a significant discrepancy between the actual choice for initial treatment option for hepatocellular carcinoma (HCC) and recommendations from the currently used BCLC staging system. We develop a machine learning-based clinical decision support system (CDSS) for recommending initial treatment option in HCC and predicting overall survival (OS). From hospital records of 1,021 consecutive patients with HCC treated at a single centre in Korea between January 2010 and October 2010, we collected information on 61 pretreatment variables, initial treatment, and survival status. Twenty pretreatment key variables were finally selected. We developed the CDSS from the derivation set (N = 813) using random forest method and validated it in the validation set (N = 208). Among the 1,021 patients (mean age: 56.9 years), 81.8% were male and 77.0% had positive hepatitis B BCLC stages 0, A, B, C, and D were observed in 13.4%, 26.0%, 18.0%, 36.6%, and 6.3% of patients, respectively. The six multi-step classifier model was developed for treatment decision in a hierarchical manner, and showed good performance with 81.0% of accuracy for radiofrequency ablation (RFA) or resection versus not, 88.4% for RFA versus resection, and 76.8% for TACE or not. We also developed seven survival prediction models for each treatment option. Our newly developed HCC-CDSS model showed good performance in terms of treatment recommendation and OS prediction and may be used as a guidance in deciding the initial treatment option for HCC.

Scientific RepoRtS | (2020) 10:14855 | https://doi.org/10.1038/s41598-020-71796-z www.nature.com/scientificreports/ Hepatocellular carcinoma (HCC) is the third and seventh most common malignancy in men and women worldwide, respectively, and its incidence continues to increase 1 . The American Association for the Study of Liver Diseases and the European Association for the Study of the Liver currently endorse the Barcelona Clinic Liver Cancer (BCLC) staging system as a primary prognostic model and a allocating tool of HCC treatment 2,3 . However, there is a significant discrepancy in the initial treatment choice for HCC between the recommendations from the BCLC system and real clinical practice 4,5 . This is partially because treatment decision for HCC is highly multifactorial, in which physicians need to take into consideration the HCC stage, baseline liver function, and performance status. Moreover, other factors such as location and distribution of tumour, presence of intermediate nodule, comorbidities, socio-economic status, availability of potential living related-donors, and the invasiveness and feasibility of each treatment option play critical roles in determining the clinical outcomes of patients with HCC. Such complex nature of HCC treatment decision has hindered large-sized clinical studies, because conventional statistical methods fall short of aptly controlling multiple variables and factors.
Recent attempts on applying the artificial intelligence (AI) technique to clinical practice have focused on using AI to develop clinical decision support system (CDSS) [6][7][8][9][10][11] . For this study, we reasoned that machine learning, an application of AI that self-improves by learning from large amounts of data, would be useful for generating an algorithm for that evaluates multiple pretreatment variables to recommend optimal treatment options for HCC 12 . We believed that a good-quality database is essential, and the definition and selection of pretreatment variables are also significantly important for clinically plausible results. In this study, we thus gathered a team of well experienced hepatologists and AI scientists at our centre and developed a CDSS algorithm that can recommend optimal initial treatment for patients with HCC and predict the overall survival (OS) of patients after treatment, based on our centre's experiences.

Methods
Study population. We retrospectively reviewed hospital records of 1,650 consecutive patients who were newly diagnosed with HCC at Asan Medical Center (Seoul, Korea) between January and October 2010 (Supplementary Fig. 1). Patients who had a treatment history of HCC (N = 356), those who received HCC treatment at other hospitals (N = 138), those who had a metastatic liver cancer (N = 71), those with secondary malignancies that might affect survival (N = 36), those with combined hepatocellular-cholangiocarcinoma (N = 21), and those with incidentally detected HCC after transplantation (N = 7) were excluded from the study. Consequently, the study cohort included 1,021 patients with HCC.
All enrolled patients were diagnosed with HCC using liver protocol computed tomography or magnetic resonance imaging or liver biopsy according to the current guidelines of the American Association for the Study of Liver Diseases 13 . Patients were randomly allocated to the derivation or validation set at a ratio of 4:1. The protocols of this study were approved by the Institutional Review Board of Asan Medical Center (IRB number: 2017-0188), and the requirement for informed consent from patients was waived due to the retrospective nature of the study. All methods were performed in accordance with the relevant guidelines and regulations. Data collection. We used our institutional database to collect information on the initial treatment option, initial treatment response, and OS of all patients. We retrospectively collected pre-treatment demographic, clinical, and imaging variables (Supplementary Table S1), treatment information and survival status of all the 1,021 patents from our centre's database. The following demographic factors were assessed: age, sex, Eastern Cooperative Oncology Group (ECOG) score, aetiology of liver disease, presence of potential liver-related donor, body mass index (BMI), occupation, resident area, patients' educational attainment, maximum tumour size, tumour number, tumour type (infiltrative or nodular), tumour enhancement pattern, tumour distribution, portal vein invasion, hepatic vein or inferior vena cava invasion, bile duct invasion, extrahepatic metastases, presence of dysplastic nodule, radiofrequency ablation (RFA) feasibility, presence of cirrhosis, Child-Pugh class, presence of varix, laboratory findings including alpha-feto protein (AFP) level, within or above the Milan criteria, initial treatment option, initial treatment response, and OS. RFA feasibility was defined as a size or location of the tumour to receive percutaneous RFA successfully without significant complications, evaluated by a single hepatologist, G.H.C. Tumour location adjacent to the large vessel, bile duct, hepatic hilum, liver capsule or extrahepatic organ was classified as an RFA non-feasible lesion. OS was defined as the time form date of imaging diagnosis of HCC to the date of death due to any cause.
Among the 61 initial pretreatment variables, 20 key variables (Table 1) were selected based on the importance scores calculated using the automated classifier model and the survival prediction model in the derivation set. Specifically, 14 variables were patient-related factors (age, BMI, Child-Pugh class, presence of varix, presence of ascites, ECOG score, haemoglobin level, platelet count, albumin level, prothrombin time, alanine aminotransferase [ALT] level, total bilirubin level, creatinine level, and AFP level), and six were tumour-related factors (tumour number, maximal tumour size, tumour distribution, presence of portal vein invasion, presence of metastasis, and RFA feasibility). Using these 20 variables, random forest and random survival forest methods were trained and evaluated again to recommend treatment options and to predict OS respectively in both the derivation and validation sets.
Treatment options were classified as follows: transplantation, surgical resection, RFA or percutaneous ethanol injection therapy (PEIT), transarterial chemoembolisation (TACE), TACE combined with external beam radiotherapy (EBRT), sorafenib treatment, supportive care, and other therapies, which included combined therapy (e.g. surgical resection with intraoperative RFA, TACE combined with sorafenib), palliative resection, intra-arterial cytotoxic chemotherapy, clinical trials, and EBRT alone. Database review was performed by one hepatologist (G.H.C.) to avoid inter-observer bias.   www.nature.com/scientificreports/ Machine learning for CDSS development in HCC. The primary outcomes were accuracies of treatment recommendation and survival prediction. The index date was defined as the date when patients underwent their first liver protocol computed tomography or magnetic resonance imaging. The follow-up period for each patient was estimated from the index date to the date of death or the last follow-up date. Due to large differences in survival between treatments, it was difficult to train a machine learning-based model of treatment recommendation and survival prediction in an integrated way. Therefore, treatment recommendation and survival prediction models were separately designed and trained. Treatment recommendation models were hierarchically designed with six classifiers in the same manner as treatment planning in clinical practice. Supervised learning was adapted to prefer curative modalities using a classifier method. Transplantation option was not included in the treatment decision algorithm due to the medical environment of severe shortage of deceased liver donor. Although transplantation was not included in the classifier model, transplantation was suggested as an option, when it met the Millan criteria. Because factors affecting the prognosis were different for each treatment, we developed survival prediction models for each treatment. Our CDSS system operated by sequentially using treatment recommendation and survival prediction models.
To develop the treatment recommendation and survival prediction model, random forest model was employed. Random forest, which is one of the representative ensemble methods, is widely used because it is powerful and relatively lighter than other ensemble methods 14,15 . Random forest constructs a number of treetype base models and forms an ensemble through a technique called bootstrap aggregating or bagging. As the splitting rules for random forests, Gini impurity and log-rank test were used for treatment recommendation and survival prediction models, respectively. Other possible combinations of hyperparameters of models were investigated by grid search using GridSearchCV library in Scikit-learn package. Figure 1 shows the schematic diagram for the construction of the CDSS model for HCC. The model comprised six multi-step classifiers and seven OS prediction sub-models. The input variables (N = 20) were processed with the algorithm for treatment recommendation with multi-step classifiers. The CDSS model for HCC was designed to prefer curative modalities (transplantation, resection, RFA or PEIT). Once a treatment option is selected, the model demonstrates the predicted OS curve for each patient. Additionally, if another treatment option is available, our CDSS model for HCC can suggest another predicted OS curve after the alternative treatment. Therefore, the model can predict different OS curves of the same patient with different treatments, which could be helpful when clinicians made treatment decisions in actual clinical setting.

Statistical analyses.
Baseline characteristics of the patients were compared using the chi-square test for categorical variables and the Mann-Whitney U test for continuous variables. Survival distributions were compared using the Kaplan-Meier method with a log-rank test. Patients in our follow-up programme who were not confirmed deceased were recorded as censored.
In the initial phase of model development, we fitted a univariate Cox proportional hazards model to the treatment decision and survival endpoints. To select variables, we employed a two-step variable selection approach. The first step was to fit a random forest model to compute a variable importance score, and the second step was to compute a relative selection frequency based on a bootstrap resampling method 16,17 .
For the validation data sets, per-patient based analysis was performed from probability values using accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each classifier. The accuracy was defined as the percentage of correctly classified instances and calculated as follows: accuracy = (TP + TN)/ (TP + TN + FP + FN), where TP, TN, FP, and FN are true positive, true negative, false positive, and false negative, respectively. Each survival prediction model was validated using bootstrapping to correct for optimistic bias. Time-dependent concordance (C)-index was used to evaluate predicted survival times, which were ranked in accordance with the observed survival times. All P-values were two-sided and P < 0.05 were considered significant. The outcome of implicit feature selection of the random forest was visualised using the Gini importance 18

Results
Characteristics of the study patients. We trained our CDSS system using the derivation set (N = 813) and validated it in the validation set (N = 208). Two sets were divided by stratified random splits. The same derivation and validation sets were used for both treatment recommendation and survival prediction models.  Table 2. Of the total 1,021 patients (mean age, 56.9 years), 81.8% were male, and 77.0% had positive hepatitis B virus surface antigen. Moreover, 76.3% of patients were classified with Child-Pugh class A, and 75.1% had ECOG score of 0. Regarding tumour-related factors, 41.7% of patients had multiple tumours, and the median maximal tumour diameter was 4.0 cm (IQR 2.3-8.5). Portal vein invasion and distant metastasis were confirmed in 22.8% and 12.2% of patients, respectively. BCLC stages 0, A, B, C, and D were observed in 13.4%, 26.0%, 18.0%, 36.6%, and 6.3% of patients, respectively. As an initial treatment, transplantation was performed in 4.5%, resection in 32.9%, RFA or PEIT in 7.5%, TACE in 31.5%, TACE combined with EBRT in 6.6%, sorafenib treatment in 3.0%, supportive care in 10.1%, and other therapies in 3.8% of patients. Among the other therapies, nine patients underwent resection combined with intraoperative RFA, nine underwent palliative resection, eight underwent EBRT to liver, six underwent TACE combined with sorafenib or cytotoxic chemotherapy, and four underwent intra-arterial cytotoxic chemotherapy. Moreover, three patients were enrolled in clinical trials and underwent systemic therapy. There was no significant difference between the derivation and validation set with respect to patient-, tumour-, or treatment-related variables.   Table 3 shows the accuracy of the six classifier models trained from the derivation set. The recommended treatment from the model was compared with the treatment used in real clinical practice in the validation set. Overall, our CDSS classifier model for HCC was well generalised and showed good performance, and its standard deviations were higher in the lower branches of the treatment (e.g. sorafenib treatment, supportive care, other therapies) as the number of patients were relatively smaller.  Figure S3 shows the importance of the features ranked by the Gini importance that calculates reduced impurity in all trees.  Figure S4 shows the importance of the features for OS prediction in each recommended treatment.

Discussion
In the present study, we developed a machine learning-based CDSS algorithm for recommending initial treatment option for HCC by employing clinical data from 1,021 patients. Treatment recommendations made by the CDSS model for HCC showed high accordance with the actual treatment choices, and the OS prediction was also highly associated with the observed 5-year survival rates. We present a detailed example of the application of the CDSS model for HCC (Fig. 3). A 43-year-old male patient had Child-Pugh class A and a 2-cm-sized single HCC without evidence of vascular invasion and extrahepatic metastasis. The patient's clinical details were as follows: ECOG score 0, haemoglobin 12.2 g/dL, platelet count 92 × 10 9 /mm3, albumin 3.4 g/dL, ALT 46 U/L, total bilirubin 1.0 mg/dL, creatinine 0.7 mg/dL, and AFP 42.4 ng/mL. The HCC CDSS model recommended resection as the initial treatment and the predicted 3-year and 5-year survival rates were 90.2% and 83.4%, respectively. The CDSS model for HCC also provided an estimated survival rate for RFA and transplantation, for which the predicted 5-year survival rates were 51.5% and 94.7%, respectively. In real clinical practice, this patient initially underwent resection, experienced HCC recurrence 3.2 year after the resection, received subsequent multiple on-demand TACE treatments, and still survived for a total of 6.9 years following resection.
For the development of the CDSS model for HCC, we adapted the machine learning method to overcome the complexity of treatment decision for HCC. We took special care in selecting the proper pretreatment variables and constructing high-quality database in order to ensure that the algorithm training goes well to produce applicable results. We first recruited 61 variables that are known to influence HCC treatment decision in daily clinical practice, and inputted them in a hierarchical classifier model. Through a refinement process, we finally Table 3. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for the six classifier models in the validation set. EBRT external beam radiotherapy, NPV negative predictive value, PEIT percutaneous ethanol injection therapy, PPV positive predictive value, RFA radiofrequency ablation, TACE transarterial chemoembolisation. *Variables are presented as mean ± standard deviation.  www.nature.com/scientificreports/ To the best of our knowledge, this is the first description of a machine learning-based CDSS model developed for treatment decision and survival prediction in HCC. Our CDSS model for HCC not only provides the best treatment option, but also suggests alternative treatment and predicts prognosis after each treatment. Our results show that the CDSS model for HCC may be used as a supplementary system for physicians in deciding the treatment option for HCC and explaining their choice to the patients. Future multicentre studies using the HCC-CDSS model would allow for a more powerful comparison in the treatment patterns between centres and recommend treatment options with relative strength according to each centre.
Previous studies that used AI to study HCC have primarily focused on prognosis prediction after resection or TACE 6,9,10,21,22 . A recent study employed deep learning to identify multi-omics features associated with the differential survival of patients with HCC 23 . Compared to the algorithms used in previous studies, our algorithm focused more on the clinical and radiological parameters and could thus be more easily used in daily clinical practice. The integration of individual genetic information to the HCC CDSS model would enable physicians to make a more accurate selection for HCC treatment in each patient.
The Watson for Oncology, a cognitive computing system trained at the Memorial Sloan Kettering Cancer Center (New York, USA), uses natural language processing and machine learning to provide treatment recommendations. The Watson system processes structured and unstructured data from medical literature, treatment guidelines, medical records, imaging, laboratory and pathology reports, and the expertise of the physicians at Memorial Sloan Kettering to formulate therapeutic recommendations 24 . However, the Watson system has yet to be adapted for use in HCC, which may partially be due to complexity of factors that affect the treatment decision in HCC.
Our algorithm adapted manual database input in the development, and human efforts are certainly required during data acquisition. However, we already started another AI study, allowing us to automatically learn the radiological information of HCC, and training our algorithm more easily in the future learning process.
Our algorithm could not properly classify patients who received living related donor liver transplantation (LDLT) in the derivation set. As a possible explanation, it was generated in the medical environment of severe shortage of deceased liver donor. LDLT is the main method used when performing liver transplantation in our centre. Decision process of LDLT could be significantly different from that of other treatments, probably www.nature.com/scientificreports/ because not only availability of living donors and ethical and economic problems but also treatment willingness of the patients could influence significantly more to the decision of LDLT. Therefore, our algorithm could not recommend LDLT in the relevant patients. Hence, a different process in the decision of LDLT in patients with HCC within the Milan criteria should be considered. However, our algorithm could even predict the survival of a patient if he/she has a certain condition that requires LDLT as an initial treatment. The present study has the following limitations. First, in this study, we trained our algorithm only for the initial treatment option and not for subsequent treatments after recurrence. Moreover, our algorithm was trained using a database from a single centre located in a HBV-endemic area with mostly male patients. Therefore, the HCC-CDSS model may show less power when used in centres with different demographics (e.g. ethnicity, aetiology, level of hospital facility, socio-economic status of the country, and even reimbursement policy), where the optimal treatment option would be different. Second, this study comprised a relatively small sample size included for each treatment specially transplantation, RFA, and sorafenib treatment. Although the c-index of the survival prediction model for these treatments was at an acceptable level, additional validation is required. Therefore, we look forward to expanding our database with the collaboration with diverse medical centres through online web-site, allowing and to make our algorithm to be more suitable for use in diverse clinical environments. Finally, although patient's preference is one of the important factors in making treatment decisions, this variable was not included in this model. However, this cannot be quantified only by the patient's age and financial status. Therefore, we tried to compensate it by presenting the survival curve of preferred and alternative treatments.
We are more than willing to share our algorithm from the web-site with any centre worldside baseed on collaboration. This algorithm was built basically from our clinical practice and could function differently in other centres, but, hopefully, a future multicentre study could widen the usefulness of our algorithm as a method of efficacy comparison between different centres.
In conclusion, we developed HCC CDSS model for treatment decision and prognosis prediction in patients with HCC. This algorithm is considered benefical to physicians when discussing with HCC patients and when establishing a treatment decision for the appropriate initial treatment based on the estimated survival according to each treatment option, specially in HBV-endemic area. Further CDSS model with the integration of genetic information and automatically acquired imaging data could enable more individualised treatment to each patient.