Introduction

Hepatocellular carcinoma (HCC) accounts for ~90% of primary liver cancers [1]. The annual recurrence rate of HCC after surgical resection is ≥10% and reaches 70–80% after 5 years [2,3,4,5]. Therefore, it is clinically important to identify patients at high risk of HCC recurrence after curative surgical resection. Currently, there are no useful postoperative recurrence markers; thus, measurement of levels of tumor markers, such as alpha-fetoprotein and des-gamma-carboxy prothrombin, and periodic imaging tests, such as computed tomography and magnetic resonance imaging, are used to identify HCC recurrence. Previous studies have shown that early recurrence of HCC after resection is associated with low overall survival [6, 7]. Early recurrence after resection is also associated with tumor size, number of tumors, and portal vein invasion [8, 9]. However, pathological evaluation for early recurrence of HCC has not been described.

In digital pathology, machine learning (ML) approaches have been applied to a variety of image processing and classification tasks, with diagnosis methods reported for breast [10] and prostate [11] cancers. Image processing approaches are currently being used to evaluate the spatial arrangement and architecture of different types of tissue elements to predict clinical outcomes. Several examples of prognosis prediction methods using ML approaches based on pathological information have been described, including the relationship between the spatial arrangement of clusters of tumor-infiltrating lymphocyte and prognosis of non-small-cell lung carcinoma [12], prediction of overall survival for breast cancer [13], and the relationship between nuclear features of the stromal and the epithelial compartments and prediction of human papillomavirus-positive oropharyngeal cancer [14]. The application of ML to predict HCC recurrence include assessment of miRNA expression in exosomes [15] and CpG methylation signatures [16]. HCC diagnosis based on circulating miRNA information has also been reported [17]. However, to our knowledge, there are no reports of using images for HCC diagnosis and recurrence prediction.

Here, we applied support vector machine (SVM) methods to digital pathologic images of HE-stained specimens from resected tissues to predict the early recurrence of HCC after resection.

Materials and methods

Sample information

We included a total of 158 patients meeting the Milan criteria who underwent hepatic resection for HCC as curative treatment at Yamaguchi University (100 cases), Ogaki Municipal Hospital (47 cases), and Tokyo Medical University (11 cases). All cases used were primary HCC. Previously, our group revealed that the pattern of recurrence of HCC after surgical resection varied from year to year. In other words, recurrence within 2 years are mostly residual recurrences, and in 2–4 years, it was residual recurrence and/or multicentric (MC) carcinogenesis, and it was MC carcinogenesis after 4 years. Moreover, recurrence rate after resection is gradually decreasing with each passage [18]. The patients were categorized into three groups. Group I (39 cases) included patients with HCC recurrence within 1 year after hepatic resection. Group II (50 cases) included patients with HCC recurrence between 1 and 2 years after resection. Group III (69 patients) included patients without HCC recurrence at 4 years after resection (Table 1). To analyze recurrence prediction, we randomly selected 69 cases (16, 22, and 31 from Groups I–III, respectively) as training set; the remaining 89 cases (23, 28, and 38 from Groups I–III, respectively) were the model test set. The study was performed according to the principles of the Declaration of Helsinki and was approved by the ethical committees of Yamaguchi University, Ogaki Municipal Hospital, and Tokyo Medical University (SH4140).

Table 1 Clinical background.

Region of interest (ROI) selection and image size

Formalin-fixed, paraffin-embedded HE-stained slides were scanned using a whole slide image (WSI) scanner (NanoZoomer-RS; Hamamatsu Photonics, Hamamatsu, Japan) at ×20 image magnification. Images acquired by WSI were in ndpi (scanner original) format with an average size of 1 GB and converted to tiff format for parsing due to memory limitations in the Windows operating. Under low-magnification WSI, 20 and 10 ROIs per sample were selected in HCC and surrounding non-HCC areas, respectively. ROIs with strong necrosis, low tumor cell content, and high blood cell aggregation in the selected images were excluded from the analysis. The analysis included a total of 1369 ROIs from 69 training set images and 1346 ROIs from 89 test set images from HCC areas, as well as 738 ROIs from 69 training sets images and 680 ROIs from 89 test set images from non-HCC areas (Fig. 1 and Supplementary Fig. 1).

Fig. 1: Image analysis process.
figure 1

Whole slide image (WSI) scans of hepatocellular carcinoma (HCC) and surrounding non-HCC area (a), region of interest (ROI) selection for HCC (b), and non-HCC (f), nuclei selection by Ilastik for HCC (c) and non-HCC (g), mask image for HCC (d) and non-HCC (h), and analysis images for HCC (e) and non-HCC (i).

Nuclei segmentation on ROI images

The nuclei <80 pixels were removed as potential nucleus fragments by ilastik software (http://ilastik.org/). Each ROI was 2048 × 2048 pixels (at ×40 resolution), corresponding to a tissue area of 0.25 mm2. A total of 970,986 and 886,900 nuclei in these ROIs were analyzed in the training and test sets for the HCC area, respectively, and 328,463 and 251,376 nuclei in the ROIs in the training and test sets in the non-HCC areas.

Quantitative nucleus and ROI feature measurement

The morphological features of the segmented nuclei were analyzed using Cellprofiler (https://cellprofiler.org) and Cell Feature Level Co-occurrence Matrix [19]. Each ROI contained 300–2000 nuclei, with 903 ROI features based on nuclei information: for example, average and standard deviation, heterogeneity, and morphological features for nucleus size, contour line length, orientation, roundness and intra-nucleus texture (chromatin pattern) entropy, variance, second angular moment, etc.

Analysis of nuclear features and their rate of postoperative early recurrence

Nuclei (9907/565,421, 1.7%) in Groups I and III of the training set were randomly selected and an SVM model was created using information from 81 nuclei features generated from CellProfiler output (http://cellprofiler-manual.s3.amazonaws.com/CellProfiler-3.0.0/modules/measurement.html). On ROI-based SVM discrimination, we listed highly contributed features top 20 on Supplementary Table 1. The features having high weight value were contributed strongly for discrimination.

Analysis of the morphological features of ROIs

The prediction of recurrence analysis was performed with both linear and radial basis function (RBF) kernel SVM methods (e1071 library on R system) (https://cran.r-project.org/web/packages/e1071/index.html). We used the training data to create the SVM model, which was applied to the test patient data set. As the SVM with a linear kernel showed higher prediction accuracy than the SVM with an RBF kernel, we reported results based on the former SVM.

Results

Nuclei features-based analysis results

First, we analyzed proportion of nuclei belonging to Group I with CellProfiler outputted nucleus features. In Group I cases, 16 of the 23 cases were >20%. In contrast, in 22 of 38 cases in Group III, <10% of nuclei of the nuclei were characterized by early recurrence (Table 2).

Table 2 Group classification based on nuclear information.

ROI-based analysis results

Classification of the ROI of the HCC area into three groups using SVM model training (linear kernel) showed an accuracy of 99.8% (Table 3a). The ROI of the non-HCC area was then classified into three groups using SVM, with a probability of 100% (Table 3b). When the classification formula created using the training set was verified using the test set, the probabilities of correct classification of the ROIs in the HCC and non-HCC areas were 80.6% and 68.1%, respectively (Table 3c, d).

Table 3 Region of interest (ROI)-based support vector machine (SVM) prediction of hepatocellular carcinoma (HCC) recurrence.

In addition, the information on ROIs contained in the HCC or non-HCC areas were summed, and the accuracy of the classification between the three groups was verified on a case-by-case rather than an ROI basis. The group to which the maximum number of ROIs belonged was the group to which the case belonged. The accuracies for HCC and non-HCC areas were 88.8% and 64.0%, respectively (Table 4a, b).

Table 4 Case-based prediction of hepatocellular carcinoma (HCC) recurrence.

Aggregated case-based prediction results

Finally, three integrated SVM models; ROI of HCC and non-HCC area based SVM, and nuclei features based SVM, were used for the prediction of HCC recurrence.

The values of A, B, and C were calculated as the average of the probabilities for ROIs in the HCC areas predicted to be Groups I, II, and III, respectively. The values of D, E, and F were also calculated as the averages of the probabilities of ROIs in the non-HCC areas predicted to be Groups I, II, and III, respectively. At the nuclei feature base, G was defined as the percentage of case nuclei in Group I. The prediction algorithm is shown in Fig. 2.

Fig. 2: Algorithm for prediction using three support vector machine (SVM) models.
figure 2

The average values of the probabilities predicted to be Groups I, II, and III in the region of interest (ROI) of the HCC area were set as A, B, and C, respectively, whereas the average values of the probabilities predicted to be Groups I, II, and III in the ROI in the non-HCC area were set as D, E, and F, respectively. The predicted probability of a nucleus belonging to Group I was set as G.

(1) If the value of G was ≥20%, it was assumed that the case was Group I or II. Next, if comparisons of the ROI values of the HCC area showed A > B, the case was categorized as Group I; similarly, if A < B, the case was categorized as Group II.

For example, Case 1 had a G value of 34.8, which is >20%. Next, since A was 0.92 and B was 0.08, A > B; thus, Case 1 was predicted to belong to Group I. Case 25 had a G value of 35.2, also >20%. As A was 0.01 and B was 0.70, this case was predicted to belong to Group II because A < B.

(2) If the G value was 10–19% and A + B and D + E were ≤0.5, the case was predicted to be in Group III. If A + B and D + E were not <0.5, the values of A, B, D, and E were compared. If the value of A or D was larger than the other values, the case was predicted to be in Group I; if the value of B or E was larger, the case was predicted to belong to Group II.

For example, Case 35 had a G value of 17.8. The values of A + B and D + E were 0.93 and 0.98, respectively, both of which were >0.5. Of A, B, D, and E, E was the largest, at 0.98; therefore, the case was predicted to belong to Group II. Similarly, the G value for Case 65 was 18.8. Since A + B was 0.47 and D + E was 0.18, both <0.5, the case was predicted to belong to Group III.

(3) When the value of G was ≤10, the case was predicted to belong to Group I when A + B was >0.5 and A > B, and Group II when A < B, and Group III if A + B < 0.5. For example, Case 41 had a G value of 4.0. A + B was >0.5 and A (0.02) was <B (0.75). Therefore, Case 41 was predicted to belong to the Group II.

With this algorithm, these models showed an accuracy of 89.9% (80/89) (Table 5). Twenty-four cases were classified as Group I, of which 23 were really group I and the remaining one was Group II. Of the 35 cases predicted to be in Group II, 27 were actually Group II; the remaining 8 cases were Group III. Thirty cases predicted to be Group III were actually in Group III.

Table 5 Results of hepatocellular carcinoma (HCC) recurrence prediction by integrating three types of support vector machine (SVM).

The prediction algorithm is created on training data set, one-third of ROIs removed as a validation set, SVM models created on the remaining data set. This process was repeated three times (Supplementary Table 2).

Prediction was performed using average value of probability of ROIs belonging to each case. The number of prediction of ROIs is shown in Supplementary Table 3.

Discussion

The recurrence rate of surgically resected HCC is high, and until now, no predictive method had been described. The results of the present study demonstrated the efficient prediction of recurrence after resection applying ML-based approach on pathological findings.

The pathological evaluation for early recurrence of HCC has not been reported. HCC patients are at high risk for MC tumors due to the strong carcinogenic background of the liver. Recurrence patterns after treatment for HCC are diverse, including MC and intrahepatic metastasis. Because the mode of development and clinical course differ, it is important to distinguish between the two; however, it is difficult to do so clinically or pathologically [20].

We performed classification using the tumor marker and histopathological information between the three groups, the accuracy of prediction using the tumor markers AFP and DCP was 44.8%, and that using the tumor and histopathological information was 53.8%, respectively (Supplementary Table 4). In addition, the significant difference between groups of each item used for these predictions was not demonstrated (Supplementary Table 5). Moreover, while the recurrence risk factors for liver cancer include the number of tumors, size, vascular invasion, distant metastasis, etc., the patients’ backgrounds in the present study were normalized by enrolling patients according to the Milan criteria.

The prediction of breast and prostate cancer recurrence using pathological findings has been reported. However, image-based prognosis prediction in these conditions is based on glandular structures and tumor invasion patterns in adenocarcinomas, whereas the structure and invasion pattern in HCC differ from those of adenocarcinoma. In addition to the analysis of ROI units of the HCC area, the prediction accuracy could be improved to 89.9% by combining nuclear information in HCC areas and ROIs of non-HCC areas. The results of the analysis showed that all cases predicted to recur within 1 year indeed recur within 1 year; similarly, all cases predicted to have no recurrence did not recur. Among the cases with an incorrect prediction, 1/28 in Group II and 8/38 in Group III did not have a recurrence although they were predicted to have a recurrence in 1–2 years. Clinically, it is useful to predict both: no recurrence with no false-negative results and early recurrence. Although, there is a limitation in our current prediction model due to the low number of cases used for analyses, and there is a possibility of overfitting, especially in non-HCC in ROI-based SVM prediction results. However, the results were considered clinically useful due to the synthesis of three SVM models. We are currently conducting a study with an increased number of cases.

In conclusion, there has been no useful method to predict recurrence after HCC resection until now. We developed a recurrence prediction method based on ML by comprehensively using information on cancer tissue, peripheral non-cancerous tissue, and nuclei. While all cases are considered high risk after HCC resection, our method showed promise as a novel follow-up method to review the frequency of tests and determine the need for additional treatment.