CT-radiomics and clinical risk scores for response and overall survival prognostication in TACE HCC patients

We aimed to identify hepatocellular carcinoma (HCC) patients who will respond to repetitive transarterial chemoembolization (TACE) to improve the treatment algorithm. Retrospectively, 61 patients (mean age, 65.3 years ± 10.0 [SD]; 49 men) with 94 HCC mRECIST target-lesions who had three consecutive TACE between 01/2012 and 01/2020 were included. Robust and non-redundant radiomics features were extracted from the 24 h post-embolization CT. Five different clinical TACE-scores were assessed. Seven different feature selection methods and machine learning models were used. Radiomics, clinical and combined models were built to predict response to TACE on a lesion-wise and patient-wise level as well as its impact on overall-survival prognostication. 29 target-lesions of 19 patients were evaluated in the test set. Response rates were 37.9% (11/29) on the lesion-level and 42.1% (8/19) on the patient-level. Radiomics top lesion-wise response prognostications was AUC 0.55–0.67. Clinical scores revealed top AUCs of 0.65–0.69. The best working model combined the radiomic feature LargeDependenceHighGrayLevelEmphasis and the clinical score mHAP_II_score_group with AUC = 0.70, accuracy = 0.72. We transferred this model on a patient-level to achieve AUC = 0.62, CI = 0.41–0.83. The two radiomics-clinical features revealed overall-survival prognostication of C-index = 0.67. In conclusion, a random forest model using the radiomic feature LargeDependenceHighGrayLevelEmphasis and the clinical mHAP-II-score-group seems promising for TACE response prognostication.

In 2020 primary liver cancer ranked as the third leading cause of cancer death world-wide 1 . Hepatocellular carcinoma (HCC) comprises around 75-85% of primary liver cancers and over the last 20 years its incidence has been rising 1,2 . The diagnostic work-up of HCC-suspicious observations includes among others clinical examinations, laboratory analysis, imaging studies and often tumor biopsy 2 . The treatment of HCC is complex and depends on the tumor stage. Potential curative treatments include liver resection, transplantation or local ablative methods like microwave ablation 2 . HCC is predominantly arterially vascularized enabling the intra-arterial application of chemotherapy and embolization 2 . These methods, like transarterial chemoembolization (TACE) are mainly palliative but may enable the complete destruction of the tumour or size-reduction to enable subsequent resection or transplantation (bridging therapy) in selected cases 2,3 . TACE can prolong patient's overall survival (OS) but it may also harm patients with reduction of OS depending on patient selection 2 . A multitude of scores was developed to identify patients who will most likely benefit from TACE 2,4-8 . Nevertheless, the scores' validity is scarce and the use for treatment decision making is not recommended outside clinical trials 2 . Consequently, patients are generally individually discussed in interdisciplinary tumor board meetings to define the appropriate therapy based on expert consensus. Recent emerges in the field of quantitative computational image analysis, termed radiomics, provide promising opportunities. Images are transformed in mineable data with subsequent bioinformatic analysis allowing lesion characterization beyond visual perception 9 . Radiomics' prognostic and predictive potential was demonstrated in numerous cancer entities 9,10 . Only scarce evidence is available for TACE in HCC patients and most studies examined the pre-TACE contrast-enhanced MRI or CT though variant contrast agents or injection protocols might alter the results [11][12][13][14] . Lipiodol accumulation patterns after TACE might be used for response prognostication 15,16 but to the best of our knowledge a high dimensional pattern quantification by means of radiomics was not performed yet. We hypothesized that lipiodol retention patterns from the post-embolization CT after the first TACE can be quantified by means of radiomics to serve as imaging biomarkers for TACE response prediction. The aim of this study was to develop a predictive model for HCC patients on a (I) lesion-wise level, (II) patient-wise level and (III) for overall survival. Further, we aimed to stratify the best working model by comparing CT-derived features with clinical scores and a holistic combined model.

Methods
Written informed consent was obtained from all patients and the study was approved by the institutional Review Boards of the University Cancer Center and the Ethical Committee at the University Hospital Frankfurt (projectnumber: SGI-10-2020). The patient population was not reported previously.

Study design.
In this retrospective study we consecutively enrolled 61 HCC-patients (female, 12; mean age, 65.3 ± 10.0 years) who were treated with conventional TACE between 01/2012 and 01/2020. Inclusion criteria were: (1) Histologically confirmed HCC, (2) three consecutive TACE exclusively with the therapeutics Mitomycin C (Medac®, Hamburg, Germany) and Lipiodol (Guerbet GmbH, France) ± degradable starch microspheres (EmboCept®S, PharmaCept GmbH, Berlin, Germany) and injected in the same liver region, (3) all mRECIST target lesions (TL) were treated with each TACE, (4) post-TACE unenhanced CT 24 h after TACE, (5) contrastenhanced arterial and portal-venous/ delayed phase MRI or CT prior to the first and after the third TACE. Exclusion criteria: (1) Consecutive TACE applied in different liver regions, (2) time interval between first and last TACE > 6 months, (3) prior local therapy of TLs, (4) no TLs, (5) insufficient image quality, (6) other chemotherapeutic agents. 61 patients met the criteria and were evaluated. In Fig. 1 we depict the flow-chart of patient inclusion following STARD. A scheme of the study's workflow is shown in Supplementary Data S1. Image segmentation and preprocessing. The image stack was visualized and processed using the 3D Slicer software platform (http:// slicer. org, version 4.9.0) 19,20 . We resampled the images to a spacing of 1 mm × 1 mm × 1 mm prior to features extraction. One blinded investigator (OE, board-certified radiologist, 10 years of experience) tagged and segmented a maximum of two TLs per patient using the 24 h post-embolization CT after the first TACE. The tagged TLs were independently segmented by a second blinded investigator (SB, radiologist-in-training, 3.5 years of experience). Segmentation was performed as follows: a three-dimensional volume of interest (VOI) was manually drawn in the HCC-lesion, sparing equivocal border zones. The semi-automatic grow from seeds algorithm was used to augment the VOI to match the whole tumor habitat [20][21][22] . Clear foci of segmentation error were manually erased using the brush-erase tool. A representative segmentation is shown in Fig. 2.
Inter-observer robustness and feature redundancy. The intra-class correlation coefficient (ICC) was calculated for each feature using ICC3 of the Pingouin package 24,26 . ICC values were interpreted with thresholds commonly used in ICC-analysis, i.e. ICC 0.75-1 = excellent 24  Imaging biomarker selection and model development. We describe the workflow of feature selection and model development in a scheme in Supplementary Data S1 and in detail in Supplementary Data S8. We performed all analysis in Python 3.7.6. We used StandardScaler 27 to scale the data to uniform variance. We used t-distributed stochastic neighbor embedding (t-SNE) plots to explore cluster distributions (scikit-learn 27 ). We www.nature.com/scientificreports/ split our dataset into 70% training and 30% testing on a patient level using GroupShuffleSplit 27 . Fist, we assessed the lesion-wise response using seven different feature selection strategies and seven different machine learning models with hyperparameter optimization using Hyperopt 28 (see supplementary Data S8). Feature selection and model development was individually done for radiomics features, clinical scores and their combination. This approach ensured that the radiomics model was benchmark against clinical and combined models. The best working model was locked and transferred to predict the response on the patient-level. The selected features were used to train a random survival forest for overall survival prediction using Scikit-survival 0.16.1 29 . The performance was assessed by the concordance-index. We used the lifelines package 30     We computed the risk score that represents the expected number of events for a particular terminal node in the forest for the respective test patients. The patient with short survival yielded a higher risk score (26.89) than the patient with long overall survival (23.55). We depict the predicted Kaplan-Meier plot of the two patients in Fig. 3c which revealed significant difference in the logrank-Test (p = 0.006).

Discussion
In this study, we assessed the utility of machine learning models in predicting response to repetitive TACE in HCC patients. We used Lipiodol-retention radiomics of the first post-TACE control CT as imaging biomarker. We applied multiple feature selection strategies to train a multitude of machine learning models with exhaustive hyperparameter optimization to stratify tumor lesions' response to TACE. We transferred our lesion-wise model to a patient-level and corroborated our findings by overall survival prognostication. We demonstrated the model's ability to denote tumor risk scores associated with shorter or longer overall survival. CT-derived features were benchmarked against clinical risk scores and the best working model consisted of the combination of the single radiomics feature LargeDependenceHighGrayLevelEmphasis and the single clinical risk score mHAP_II_score_group. HCC hallmark imaging characteristics (arterial hyperenhancement with portal venous/ delayed wash-out) and mRECIST assessment of viable tumour components are well established, especially in patients treated with TACE 2 . Recent studies aimed to stratify imaging biomarkers extracted from pre-treatment contrast-enhanced imaging to build predictive models for HCC TACE response [11][12][13][14] . The studies tended to build holistic nomograms including imaging and clinical features and yielded promising predictive performances of overall survival ranging from C-indices of 0.70 to 0.77 which are in a similar range to our results 11,13,14 . Kuang et al. yielded lesion-wise mRECIST response predictions of AUC approx. 0.81 using pre-treatment MRI and clinical data 12 . No patientwise or survival analysis was done and it remained unclear how many TACE were applied prior to the analysis 12 . We followed a more stringent approach by building a model starting at a lesion-wise prediction, transferring the model to a patient-wise level and finally to overall survival. Further, arterial-phase imaging might suffer from reduced image quality due to artifacts or poor arterial phase capture. This might limit the development of robust AI models as they add noise to a system which already suffers from robustness deficiencies even in an experimental setting 24,32,33 . In line with prior studies 15,16 , our results promote the potential of lipiodol deposits to serve as imaging biomarker. Miszczuk et al. 16 prospectively enrolled 39 liver cancer patients (n = 22, HCC) treated with TACE and they could show, that high Lipiodol coverage on the 24-h post-TACE CT was associated with response to therapy. Lipiodol retention may serve as a surrogate for arterial hyperenhancement 16 , the vascularization pattern of HCC lesions might have prognostic impact 34 and our results provide quantitative corroboration of these findings. In our model, the GLDM feature LargeDependenceHighGrayLevelEmphasis, which depends on higher gray-level values (https:// pyrad iomics. readt hedocs. io/), had the highest predictive impact. This is in line with Brancato et al. 35 who predicted histological HCC grade by means of radiomics. The feature LargeDependenceHighGrayLevelEmphasis was contributing to the most powerful model to differentiate histological grade 1 versus grade 3 tumors 35 emphasizing the feature's potential to serve as imaging biomarker for HCC aggressiveness. The current ESMO clinical practice guidelines for hepatocellular carcinoma 2 do not recommend the use of prognostic scores for treatment algorithms outside clinical trials and they describe only the hepatoma arterial-embolisation prognostic (HAP) score as potential stratification tool for TACE in the future 2 . This is in line with the results of our study as the best performing clinical scores revealed biased train-/  www.nature.com/scientificreports/ 61 patients and 94 lesions our study population is rather small which might lower generalizability, but our cohort is very homogenous only including patients with histologically confirmed HCC, a total of three TACE prior to response assessment and usage of the same chemotherapeutic agent in each patient. In approx. 20% of patients additional degradable starch microspheres (EmboCept®S, PharmaCept GmbH, Berlin, Germany) were given which might have altered the retention in our standard-of-care real-world population. We leveraged a multitude of feature selection and classification strategies, nevertheless various degrees of overfitting were present in some models. Though we resampled the images to a spacing of 1 × 1 × 1 mm, we used standard-of-care imaging to develop our models with post-embolization CTs with originally 5 mm slice thickness and availability of true 1 mm reconstructions would have been favorable.
In conclusion, radiographic features derived from standard-of-care 24 h post-embolization CT have the potential to serve as imaging biomarkers for prognostication of response to TACE in HCC patients. Imaging biomarkers and clinical risk scores seem to incorporate complementary prognostic information and a combined final model of a clinical risk score and a single radiomics feature revealed the best performance. This emerging approach might pave the way to aiding clinical decision making in a clinical domain currently dominated by subjective expert consensus. Such tools might enable the more accurate stratification of patients for personalized healthcare avoiding potential adverse events in patients who most likely won't respond to TACE.

Data availability
The datasets generated and/or analysed during the current study are not publicly available due to privacy regulations but are available from the corresponding author on reasonable request.
Received: 27 June 2022; Accepted: 6 January 2023 Table 3. Classifier, feature selection strategy and performance of the best lesion-wise models. AUC, area under the curve; LASSO, least absolute shrinkage and selection operator; RFA, recursive feature addition; RFE, recursive feature elimination. See Supplementary Data S8 for more information.