Abstract
Hepatocellular carcinoma (HCC) is a representative primary liver cancer caused by long-term and repetitive liver injury. Surgical resection is generally selected as the radical cure treatment. Because the early recurrence of HCC after resection is associated with low overall survival, the prediction of recurrence after resection is clinically important. However, the pathological characteristics of the early recurrence of HCC have not yet been elucidated. We attempted to predict the early recurrence of HCC after resection based on digital pathologic images of hematoxylin and eosin-stained specimens and machine learning applying a support vector machine (SVM). The 158 HCC patients meeting the Milan criteria who underwent surgical resection were included in this study. The patients were categorized into three groups: Group I, patients with HCC recurrence within 1 year after resection (16 for training and 23 for test); Group II, patients with HCC recurrence between 1 and 2 years after resection (22 and 28); and Group III, patients with no HCC recurrence within 4 years after resection (31 and 38). The SVM-based prediction method separated the three groups with 89.9% (80/89) accuracy. Prediction of Groups I was consistent for all cases, while Group II was predicted to be Group III in one case, and Group III was predicted to be Group II in 8 cases. The use of digital pathology and machine learning could be used for highly accurate prediction of HCC recurrence after surgical resection, especially that for early recurrence. Currently, in most cases after HCC resection, regular blood tests and diagnostic imaging are used for follow-up observation; however, the use of digital pathology coupled with machine learning offers potential as a method for objective postoprative follow-up observation.
Similar content being viewed by others
Introduction
Hepatocellular carcinoma (HCC) accounts for ~90% of primary liver cancers [1]. The annual recurrence rate of HCC after surgical resection is ≥10% and reaches 70–80% after 5 years [2,3,4,5]. Therefore, it is clinically important to identify patients at high risk of HCC recurrence after curative surgical resection. Currently, there are no useful postoperative recurrence markers; thus, measurement of levels of tumor markers, such as alpha-fetoprotein and des-gamma-carboxy prothrombin, and periodic imaging tests, such as computed tomography and magnetic resonance imaging, are used to identify HCC recurrence. Previous studies have shown that early recurrence of HCC after resection is associated with low overall survival [6, 7]. Early recurrence after resection is also associated with tumor size, number of tumors, and portal vein invasion [8, 9]. However, pathological evaluation for early recurrence of HCC has not been described.
In digital pathology, machine learning (ML) approaches have been applied to a variety of image processing and classification tasks, with diagnosis methods reported for breast [10] and prostate [11] cancers. Image processing approaches are currently being used to evaluate the spatial arrangement and architecture of different types of tissue elements to predict clinical outcomes. Several examples of prognosis prediction methods using ML approaches based on pathological information have been described, including the relationship between the spatial arrangement of clusters of tumor-infiltrating lymphocyte and prognosis of non-small-cell lung carcinoma [12], prediction of overall survival for breast cancer [13], and the relationship between nuclear features of the stromal and the epithelial compartments and prediction of human papillomavirus-positive oropharyngeal cancer [14]. The application of ML to predict HCC recurrence include assessment of miRNA expression in exosomes [15] and CpG methylation signatures [16]. HCC diagnosis based on circulating miRNA information has also been reported [17]. However, to our knowledge, there are no reports of using images for HCC diagnosis and recurrence prediction.
Here, we applied support vector machine (SVM) methods to digital pathologic images of HE-stained specimens from resected tissues to predict the early recurrence of HCC after resection.
Materials and methods
Sample information
We included a total of 158 patients meeting the Milan criteria who underwent hepatic resection for HCC as curative treatment at Yamaguchi University (100 cases), Ogaki Municipal Hospital (47 cases), and Tokyo Medical University (11 cases). All cases used were primary HCC. Previously, our group revealed that the pattern of recurrence of HCC after surgical resection varied from year to year. In other words, recurrence within 2 years are mostly residual recurrences, and in 2–4 years, it was residual recurrence and/or multicentric (MC) carcinogenesis, and it was MC carcinogenesis after 4 years. Moreover, recurrence rate after resection is gradually decreasing with each passage [18]. The patients were categorized into three groups. Group I (39 cases) included patients with HCC recurrence within 1 year after hepatic resection. Group II (50 cases) included patients with HCC recurrence between 1 and 2 years after resection. Group III (69 patients) included patients without HCC recurrence at 4 years after resection (Table 1). To analyze recurrence prediction, we randomly selected 69 cases (16, 22, and 31 from Groups I–III, respectively) as training set; the remaining 89 cases (23, 28, and 38 from Groups I–III, respectively) were the model test set. The study was performed according to the principles of the Declaration of Helsinki and was approved by the ethical committees of Yamaguchi University, Ogaki Municipal Hospital, and Tokyo Medical University (SH4140).
Region of interest (ROI) selection and image size
Formalin-fixed, paraffin-embedded HE-stained slides were scanned using a whole slide image (WSI) scanner (NanoZoomer-RS; Hamamatsu Photonics, Hamamatsu, Japan) at ×20 image magnification. Images acquired by WSI were in ndpi (scanner original) format with an average size of 1 GB and converted to tiff format for parsing due to memory limitations in the Windows operating. Under low-magnification WSI, 20 and 10 ROIs per sample were selected in HCC and surrounding non-HCC areas, respectively. ROIs with strong necrosis, low tumor cell content, and high blood cell aggregation in the selected images were excluded from the analysis. The analysis included a total of 1369 ROIs from 69 training set images and 1346 ROIs from 89 test set images from HCC areas, as well as 738 ROIs from 69 training sets images and 680 ROIs from 89 test set images from non-HCC areas (Fig. 1 and Supplementary Fig. 1).
Nuclei segmentation on ROI images
The nuclei <80 pixels were removed as potential nucleus fragments by ilastik software (http://ilastik.org/). Each ROI was 2048 × 2048 pixels (at ×40 resolution), corresponding to a tissue area of 0.25 mm2. A total of 970,986 and 886,900 nuclei in these ROIs were analyzed in the training and test sets for the HCC area, respectively, and 328,463 and 251,376 nuclei in the ROIs in the training and test sets in the non-HCC areas.
Quantitative nucleus and ROI feature measurement
The morphological features of the segmented nuclei were analyzed using Cellprofiler (https://cellprofiler.org) and Cell Feature Level Co-occurrence Matrix [19]. Each ROI contained 300–2000 nuclei, with 903 ROI features based on nuclei information: for example, average and standard deviation, heterogeneity, and morphological features for nucleus size, contour line length, orientation, roundness and intra-nucleus texture (chromatin pattern) entropy, variance, second angular moment, etc.
Analysis of nuclear features and their rate of postoperative early recurrence
Nuclei (9907/565,421, 1.7%) in Groups I and III of the training set were randomly selected and an SVM model was created using information from 81 nuclei features generated from CellProfiler output (http://cellprofiler-manual.s3.amazonaws.com/CellProfiler-3.0.0/modules/measurement.html). On ROI-based SVM discrimination, we listed highly contributed features top 20 on Supplementary Table 1. The features having high weight value were contributed strongly for discrimination.
Analysis of the morphological features of ROIs
The prediction of recurrence analysis was performed with both linear and radial basis function (RBF) kernel SVM methods (e1071 library on R system) (https://cran.r-project.org/web/packages/e1071/index.html). We used the training data to create the SVM model, which was applied to the test patient data set. As the SVM with a linear kernel showed higher prediction accuracy than the SVM with an RBF kernel, we reported results based on the former SVM.
Results
Nuclei features-based analysis results
First, we analyzed proportion of nuclei belonging to Group I with CellProfiler outputted nucleus features. In Group I cases, 16 of the 23 cases were >20%. In contrast, in 22 of 38 cases in Group III, <10% of nuclei of the nuclei were characterized by early recurrence (Table 2).
ROI-based analysis results
Classification of the ROI of the HCC area into three groups using SVM model training (linear kernel) showed an accuracy of 99.8% (Table 3a). The ROI of the non-HCC area was then classified into three groups using SVM, with a probability of 100% (Table 3b). When the classification formula created using the training set was verified using the test set, the probabilities of correct classification of the ROIs in the HCC and non-HCC areas were 80.6% and 68.1%, respectively (Table 3c, d).
In addition, the information on ROIs contained in the HCC or non-HCC areas were summed, and the accuracy of the classification between the three groups was verified on a case-by-case rather than an ROI basis. The group to which the maximum number of ROIs belonged was the group to which the case belonged. The accuracies for HCC and non-HCC areas were 88.8% and 64.0%, respectively (Table 4a, b).
Aggregated case-based prediction results
Finally, three integrated SVM models; ROI of HCC and non-HCC area based SVM, and nuclei features based SVM, were used for the prediction of HCC recurrence.
The values of A, B, and C were calculated as the average of the probabilities for ROIs in the HCC areas predicted to be Groups I, II, and III, respectively. The values of D, E, and F were also calculated as the averages of the probabilities of ROIs in the non-HCC areas predicted to be Groups I, II, and III, respectively. At the nuclei feature base, G was defined as the percentage of case nuclei in Group I. The prediction algorithm is shown in Fig. 2.
(1) If the value of G was ≥20%, it was assumed that the case was Group I or II. Next, if comparisons of the ROI values of the HCC area showed A > B, the case was categorized as Group I; similarly, if A < B, the case was categorized as Group II.
For example, Case 1 had a G value of 34.8, which is >20%. Next, since A was 0.92 and B was 0.08, A > B; thus, Case 1 was predicted to belong to Group I. Case 25 had a G value of 35.2, also >20%. As A was 0.01 and B was 0.70, this case was predicted to belong to Group II because A < B.
(2) If the G value was 10–19% and A + B and D + E were ≤0.5, the case was predicted to be in Group III. If A + B and D + E were not <0.5, the values of A, B, D, and E were compared. If the value of A or D was larger than the other values, the case was predicted to be in Group I; if the value of B or E was larger, the case was predicted to belong to Group II.
For example, Case 35 had a G value of 17.8. The values of A + B and D + E were 0.93 and 0.98, respectively, both of which were >0.5. Of A, B, D, and E, E was the largest, at 0.98; therefore, the case was predicted to belong to Group II. Similarly, the G value for Case 65 was 18.8. Since A + B was 0.47 and D + E was 0.18, both <0.5, the case was predicted to belong to Group III.
(3) When the value of G was ≤10, the case was predicted to belong to Group I when A + B was >0.5 and A > B, and Group II when A < B, and Group III if A + B < 0.5. For example, Case 41 had a G value of 4.0. A + B was >0.5 and A (0.02) was <B (0.75). Therefore, Case 41 was predicted to belong to the Group II.
With this algorithm, these models showed an accuracy of 89.9% (80/89) (Table 5). Twenty-four cases were classified as Group I, of which 23 were really group I and the remaining one was Group II. Of the 35 cases predicted to be in Group II, 27 were actually Group II; the remaining 8 cases were Group III. Thirty cases predicted to be Group III were actually in Group III.
The prediction algorithm is created on training data set, one-third of ROIs removed as a validation set, SVM models created on the remaining data set. This process was repeated three times (Supplementary Table 2).
Prediction was performed using average value of probability of ROIs belonging to each case. The number of prediction of ROIs is shown in Supplementary Table 3.
Discussion
The recurrence rate of surgically resected HCC is high, and until now, no predictive method had been described. The results of the present study demonstrated the efficient prediction of recurrence after resection applying ML-based approach on pathological findings.
The pathological evaluation for early recurrence of HCC has not been reported. HCC patients are at high risk for MC tumors due to the strong carcinogenic background of the liver. Recurrence patterns after treatment for HCC are diverse, including MC and intrahepatic metastasis. Because the mode of development and clinical course differ, it is important to distinguish between the two; however, it is difficult to do so clinically or pathologically [20].
We performed classification using the tumor marker and histopathological information between the three groups, the accuracy of prediction using the tumor markers AFP and DCP was 44.8%, and that using the tumor and histopathological information was 53.8%, respectively (Supplementary Table 4). In addition, the significant difference between groups of each item used for these predictions was not demonstrated (Supplementary Table 5). Moreover, while the recurrence risk factors for liver cancer include the number of tumors, size, vascular invasion, distant metastasis, etc., the patients’ backgrounds in the present study were normalized by enrolling patients according to the Milan criteria.
The prediction of breast and prostate cancer recurrence using pathological findings has been reported. However, image-based prognosis prediction in these conditions is based on glandular structures and tumor invasion patterns in adenocarcinomas, whereas the structure and invasion pattern in HCC differ from those of adenocarcinoma. In addition to the analysis of ROI units of the HCC area, the prediction accuracy could be improved to 89.9% by combining nuclear information in HCC areas and ROIs of non-HCC areas. The results of the analysis showed that all cases predicted to recur within 1 year indeed recur within 1 year; similarly, all cases predicted to have no recurrence did not recur. Among the cases with an incorrect prediction, 1/28 in Group II and 8/38 in Group III did not have a recurrence although they were predicted to have a recurrence in 1–2 years. Clinically, it is useful to predict both: no recurrence with no false-negative results and early recurrence. Although, there is a limitation in our current prediction model due to the low number of cases used for analyses, and there is a possibility of overfitting, especially in non-HCC in ROI-based SVM prediction results. However, the results were considered clinically useful due to the synthesis of three SVM models. We are currently conducting a study with an increased number of cases.
In conclusion, there has been no useful method to predict recurrence after HCC resection until now. We developed a recurrence prediction method based on ML by comprehensively using information on cancer tissue, peripheral non-cancerous tissue, and nuclei. While all cases are considered high risk after HCC resection, our method showed promise as a novel follow-up method to review the frequency of tests and determine the need for additional treatment.
References
El-Serag HB, Rudolph KL. Hepatocellular carcinoma: epidemiology and molecular carcinogenesis. Gastroenterology. 2007;132:2557–76.
Portolani N, Coniglio A, Ghidoni S, Giovanelli M, Benetti A, Tiberio GA, et al. Early and late recurrence after liver resection for hepatocellular carcinoma: prognostic and therapeutic implications. Ann Surg. 2006;243:229–35.
Shah SA, Cleary SP, Wei AC, Yang I, Taylor BR, Hemming AW, et al. Recurrence after liver resection for hepatocellular carcinoma: risk factors, treatment, and outcomes. Surgery. 2007;141:330–9.
Sherman M. Recurrence of hepatocellular carcinoma. N Engl J Med. 2008;359:2045–7.
Shimada M, Takenaka K, Gion T, Fujiwara Y, Kajiyama K, Maeda T, et al. Prognosis of recurrent hepatocellular carcinoma: a 10-year surgical experience in Japan. Gastroenterology. 1996;111:720–6.
Takeishi K, Maeda T, Tsujita E, Yamashita Y, Harada N, Itoh S, et al. Predictors of intrahepatic multiple recurrences after curative hepatectomy for hepatocellular carcinoma. Anticancer Res. 2015;35:3061–6.
Shah SA, Greig PD, Gallinger S, Cattral MS, Dixon E, Kim RD, et al. Factors associated with early recurrence after resection for hepatocellular carcinoma and outcomes. J Am Coll Surg. 2006;202:275–83.
Shamai G, Binenbaum Y, Slossberg R, Duek I, Gil Z, Kimmel R, et al. Artificial intelligence algorithms to assess hormonal status from tissue microarrays in patients with breast cancer. JAMA Netw Open. 2019;2:e197700.
Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck-Krauss-Silva V, Busam KJ, et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med. 2019. https://doi.org/10.1038/s41591-019-0508-1.
Veta M, Kornegoor R, Huisman A, Verschuur-Maes AH, Viergever MA, Pluim JP, et al. Prognostic value of automatically extracted nuclear morphometric features in whole slide images of male breast cancer. Mod Pathol. 2012;25:1559–65.
Lee G, Sparks R, Ali S, Shih NN, Feldman MD, Spangler E, et al. Co-occurring gland angularity in localized subgraphs: predicting biochemical recurrence in intermediate-risk prostate cancer patients. PLoS ONE. 2014;9:e97954.
Corredor G, Wang X, Zhou Y, Lu C, Fu P, Syrigos K, et al. Spatial architecture and arrangement of tumor-infiltrating lymphocytes for predicting likelihood of recurrence in early-stage non-small cell lung cancer. Clin Cancer Res. 2019;25:1526–34.
Yuan Y. Modelling the spatial heterogeneity and molecular correlates of lymphocytic infiltration in triple-negative breast cancer. J R Soc Interface. 2015;12. https://doi.org/10.1098/rsif.2014.1153.
Ali S, Lewis J, Madabhushi A. Spatially aware cell cluster(spACC1) graphs: predicting outcome in oropharyngeal pl6+ tumors. Med Image Comput Comput Assist Inter. 2013;16(Pt 1):412–9.
Itami-Matsumoto S, Hayakawa M, Uchida-Kobayashi S, Enomoto M, Tamori A, Mizuno K, et al. Circulating exosomal miRNA profiles predict the occurrence and recurrence of hepatocellular carcinoma in patients with direct-acting antiviral-induced sustained viral response. Biomedicines. 2019;7. https://doi.org/10.3390/biomedicines7040087.
Qiu J, Peng B, Tang Y, Qian Y, Guo P, Li M, et al. CpG methylation signature predicts recurrence in early-stage hepatocellular carcinoma: results from a multicenter study. J Clin Oncol. 2017;35:734–42.
Yamamoto Y, Kondo S, Matsuzaki J, Esaki M, Okusaka T, Shimada K, et al. Highly sensitive circulating microRNA panel for accurate detection of hepatocellular carcinoma in patients with liver disease. Hepatol Commun. 2020;4:284–97.
Sakon M, Umeshita K, Nagano H, Eguchi H, Kishimoto S, Miyamoto A, et al. Clinical significance of hepatic resection in hepatocellular carcinoma analysis by disease-free survival curves. Arch Surg. 2000;135:1456–9.
Saito A, Numata Y, Hamada T, Horisawa T, Cosatto E, Graf HP, et al. A novel method for morphological pleomorphism and heterogeneity quantitative measurement: named cell feature level co-occurrence matrix. J Pathol Inf. 2016;7:36.
Singal AK, Freeman DH Jr., Anand BS. Meta-analysis: interferon improves outcomes following ablation or resection of hepatocellular carcinoma. Aliment Pharm Ther. 2010;32:851–8.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saito, A., Toyoda, H., Kobayashi, M. et al. Prediction of early recurrence of hepatocellular carcinoma after resection using digital pathology images assessed by machine learning. Mod Pathol 34, 417–425 (2021). https://doi.org/10.1038/s41379-020-00671-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41379-020-00671-z