Wavelet radiomics features from multiphase CT images for screening hepatocellular carcinoma: analysis and comparison

Early detection of liver malignancy based on medical image analysis plays a crucial role in patient prognosis and personalized treatment. This task, however, is challenging due to several factors, including medical data scarcity and limited training samples. This paper presents a study of three important aspects of radiomics feature from multiphase computed tomography (CT) for classifying hepatocellular carcinoma (HCC) and other focal liver lesions: wavelet-transformed feature extraction, relevant feature selection, and radiomics features-based classification under the inadequate training samples. Our analysis shows that combining radiomics features extracted from the wavelet and original CT domains enhance the classification performance significantly, compared with using those extracted from the wavelet or original domain only. To facilitate the multi-domain and multiphase radiomics feature combination, we introduce a logistic sparsity-based model for feature selection with Bayesian optimization and find that the proposed model yields more discriminative and relevant features than several existing methods, including filter-based, wrapper-based, or other model-based techniques. In addition, we present analysis and performance comparison with several recent deep convolutional neural network (CNN)-based feature models proposed for hepatic lesion diagnosis. The results show that under the inadequate data scenario, the proposed wavelet radiomics feature model produces comparable, if not higher, performance metrics than the CNN-based feature models in terms of area under the curve.

According to Globocan 2020, liver malignancy is the sixth most common cancer overall and the third most prevalent cause of cancerous death in both genders 1 .Among primary liver cancers, the most frequently encountered is hepatocellular carcinoma (HCC) 2,3 , for which treatment plans are distinguished from the remaining entities.The most crucial factors in enhancing patient prognosis are early detection and accurate characterization of HCC 4,5 .
Over the past decade, there has been a growing interest in developing computer-aided diagnosis (CAD) of liver lesions based on medical image analysis.As the medical data of the hepatic lesions is typically scarce, the majority approaches for liver lesion diagnosis can be divided into two categories: deep convolutional neural networks (CNN) and radiomics features-based models.The deep CNN aims to enhance performances of liver lesions prediction and overcomes the issue of data scarcity by leveraging the knowledge learned from the similar image processing tasks.In particular, it uses deep CNN backbones well-trained on generic image datasets for the adaptation to the medical imaging domain, also known as a transfer learning (TL) technique.With application to HCC identification, several deep CNN models, including VGG 6 , ResNet-50 7 , GoogleNet 8 , and 3D ResNet-18 9 have been employed and shown improved performances.
The radiomics features analysis of liver lesions, on the other hand, has captured numerous studies due to its capability of encoding informative biological information from medical images and handling the insufficient 1.We introduce wavelet-domain radiomics features derived from multiphase CT images to enrich the representations of different focal liver lesions (FLLs).We also analyze the effects of combining the wavelet and original CT image domain radiomics features on the HCC and non-HCC classification performance.Experimental results show that combining features extracted from the two domains significantly improves the discriminative capability compared to using only the wavelet-domain or original-domain features.Although wavelet radiomics features have been considered previously, this study, for the first time, introduces wavelet radiomics for the representations of FLLs imaging by multiphase contrast-enhanced CT modality and analyzes its effects on classifying HCC and non-HCC.Such analysis and comparison have not been investigated so far.2. This paper proposes a new model for efficient radiomics feature selection.The introduced model employs a sparse representation to handle the ill-posed feature selection problem in which the number of studied samples is far fewer than the number of extracted radiomics features.Furthermore, the proposed model incorporates statistical logistic modeling to represent the target output's conditional probability given the feature.We formulate the problem using the Bayesian framework and introduce an algorithm to solve the feature selection problem efficiently.3.This study analyzes and compares the proposed logistic sparsity feature selection model with several other techniques, including filter-based, wrapper-based, model-based, and dimensional reduction methods.We find through the experiments that the proposed logistic sparsity model is capable of yielding a compact and relevant feature subset and outperforms the other feature selection approaches with statistical significance.Furthermore, under the limited number of training samples, the relevant wavelet radiomics features tend to generalize well and outperform several deep CNN-based feature models proposed for liver lesion diagnosis.4. We have prepared and processed a CT dataset of 253 patients with hepatic lesions to support this study.
The preparation comprises screening and annotating tasks.The former involves using both the clinical and pathological information first to select the patients of interest and then identify the types of liver lesions in each case chosen.The latter requires experienced radiologists to annotate the masks of hepatic lesions manually.

Hepatic lesion labeling and data splitting
The labeling process was conducted by radiologists with more than five years of experience in hepatic imaging to provide the best quality of the benchmark, including lesion segmentation masks and the corresponding types.This process contains two steps: screening and annotating.In the screening step, the radiologists first selected the patients of interest, as mentioned in the Participants and Screening Protocol Subsection.They then determined the type of lesions in each case study.In detail, HCC was determined using histopathology reports and evidence-based practice EASL/AASLD guidelines 33,34 .Non-HCC was based on histopathology reports or typical characteristics of imaging.The non-HCC group includes metastate, intrahepatic cholangiocarcinoma, hemangioma, cyst, abscess, focal nodular hyperplasia, adenoma, too small to characterize, undifferentiated, and other rare lesions.
In the annotating step, the radiologists then manually annotated the 3D region of the lesion using the free and open-source 3D Slicer imaging platform 35 .Based on the screening information, the radiologists used the CT phase specified in the screening step that provided the clearest depiction of the lesions to annotate all FLLs, except the one with a diameter smaller than 5 mm.Two radiologists independently annotated each lesion to guarantee the accurate mask of the FLLs.The annotation was accepted if two independent radiologists' lesion masks reached a consensus with a Dice score greater than 0.8.Otherwise, the radiologists must discuss to confirm the annotated masks of lesions.Each lesion's ground-truth mask was considered the intersection of the accepted annotations.Figure 1 1.It is worth noting that rigid registration was performed to align the phases of each case study together.In particular, the arterial and delayed phases were automatically registered into the venous phase through our developed registration algorithm based on the iterative closest point technique.
We randomly separated the introduced dataset into the training set and test set with two criteria: (i) the ratio of the training set and test set was at most 8:2, and (ii) the distributions of lesion types and size between the sets were maximally equal.This results in the training set of 149 patients, with 99 HCC and 127 non-HCC lesions.The test set comprised 104 patients, with 59 HCC and 106 non-HCC lesions.Note that this study considered each lesion as a sample.Table 2 summarizes the sets.

Methods
This was a retrospective study with the approval of the Human Research Ethics Committee at the University Medical Center of Ho Chi Minh City (number 93/GCN-HDDD, dated September 17, 2021).The written informed consent was waived by the Human Research Ethics Committee of the University Medical Center of Ho Chi Minh City.All methods were performed following the ethical standards of the Helsinki Declaration.
The rest of this section presents the workflow of the proposed radiomics feature-based model for classifying HCC and non-HCC lesions.We first describe the radiomics feature extraction and selection, and then classification algorithms.

Radiomics feature extraction and selection
Radiomics features of the liver lesions can be extracted using the ROI on the segmented images.This study investigates features in original CT and wavelet domains for informative representations of liver diseases.

Feature extraction
Feature extraction has been performed in both original and wavelet domains.To mitigate the influence of inconsistent CT scan spacing within our dataset, prior to the feature extraction, we employed a cubic spline interpolation technique to recalibrate CT images to a uniform spacing of 1 mm × 1 mm × 1 mm.Each phase image of the original CT domain contributes 100 attributes, including 18 first-order statistics features, 14 shape features, and 68 texture features.We detail a list of all the used radiomics features in the supplementary document.
The first-order features characterize the spatial distribution of voxel intensities within the ROI.Such features represent commonly used metrics, including mean, variance, skewness, entropy, and uniformity.They are computed using direct image intensities or based on the histogram of the liver lesion ROI.
The shape features, on the other hand, are independent of the intensity distribution and give the visual representation of the FLLs.Typical shape attributes include diameter, area, and volume.Furthermore, elongation and flatness are also included as potential shape-related biomarkers.The texture features are based on second-order statistics and described via the density histogram and the spatial locations of image pixels.Three types of textures including gray level co-occurrence matrix (GLCM) 36 , gray level run length matrix (GLRLM) 37 , and gray level size zone matrix (GLSZM) 38 are considered in this study.
In addition to the original-domain features, we extract several features from wavelet-derived images, namely higher-order features.They are extracted based on the first-order statistics and second-order textural features.These features are captured from the wavelet-domain images transformed by applying high (H) or low (L) filters in each of the three dimensions of the CT image.For the first level of wavelet decomposition, the filtering processing results in a total of 8 wavelet-filtered images: wavelet-LHL, wavelet-LHH, wavelet-HLL, wavelet-LLH, Table 1.A summary of the dataset used in the experiments.Note that a patient can have both HCC and non-HCC lesions, leading to the summation of the number of patients with HCC and with non-HCC greater than the total number of patients.The annotation consensus between annotated was measured using Dice score.

Feature selection
Feature selection plays a crucial role in the radiomics feature-based CAD models as this task produces compact but representative features that lead to improved interpretation, prediction, and generalization.In general, the feature selection can be performed using several techniques, including filter-based, wrapper-based, or modelbased methods 39 .The filter-based methods select useful features by considering the statistical properties of the features.One widely-used filtering technique is the feature variance thresholding, which works by examining the feature variances and removes those with low values, i.e., likely containing little information.Another common technique, namely the feature correlation thresholding checks for features that have high correlations with others and eliminates one of them as they tend to contain redundant information.
The wrapper-based methods, on the other hand, employ an appointed model (regressor or classifier) to select features.The idea is to repeatedly train the selected model that contains its parameters, namely, weights or coefficients.At the first time, the model is trained using all the features.Then, the features are selected based on their important scores/ranks corresponding to the large absolute weights or coefficients.Note that the features selected by the wrapper-based methods may be sub-optimal due to being subjective to the nominated model.
In this study, we investigate the model-based techniques as they perform feature selection in the process of model construction.In particular, we consider feature selection as an ill-posed problem and introduce an efficient technique based on sparse representation to identify a compact but informative subset of features.It is worth noting here that the feature selection is regarded as an ill-posed or under-determined problem because the number of training samples (M) is far fewer than the number of features/variables (N)-M ≪ N , especially for multiphase radiomics feature selection problems.This ill-posed problem can be addressed efficiently using the least squares (LS) optimization with regularizations.
Let us denote the supervised learning task having M training samples {(x i , y i ), i = 1, . . ., M} ; each x i ∈ R N is an N-dimensional feature vector, and y i ∈ {0, 1} is a class label.It is worth noting here that for model performance improvement and outlier impact reduction, the feature vector x i is standardized to make sure it has a mean zero and a unit standard deviation, where µ and σ are, respectively, the mean and standard deviation of the feature vector x i .
C onst r u c t i ng t he t arge t ve c tor y = [y 1 , y 2 , . . ., y M ] T ∈ R M and t he fe atu re mat r i x , the features can be selected by solving the following ℓ 2 -norm regularized LS problem: where θ ∈ R N is the parameter vector (weights/coefficients) and is a hyperparameter.Problem (2) is known as the ridge regression in the statistics literature.This problem consists of two terms.The first one is the LS term that attempts to fit the estimated response to the target, and the second term is the ℓ 2 regularizer used to prevent the parameter values from increasing largely.
Using the ℓ 2 regularizer can alleviate the over-fitting issue, but this model cannot yield a compact feature subset.The reason is the ridge regressor does not guarantee a sparse solution for the parameter vector θ .To enforce model sparsity, we can replace ℓ 2 with ℓ 1 regularizer as In (3), the ℓ 1 -norm of vector θ is defined as the sum of the absolute of its entries: �θ � 1 = N i=1 |θ i | .This ℓ 1 - norm regularization promotes the sparsity by driving many entries of θ to be zeros.The non-zero entries θ i in θ corresponds to the important features x i in X , which are considered to be the relevant vector machine (RVM) 40 .This sparsity-based model is similar to the LASSO technique in the statistics literature, and is robust to problems with the presence of many irrelevant features 41 .Note that the sparsity level, i.e., the number of nonzero values K ( K ≪ N ) is governed by the hyperparameter .Increasing leads to a sparser model and thus obtains fewer relevant features.In contrast, decreasing means selecting more features.In practice, this important hyperparameter can be determined using searching techniques with cross-validation, such as grid-search, random-search 42 , or Bayesian optimization 43 .
The sparsity-based regression model in (3) is suitable for the problem where the target vector y contains continuous entries.However, in our case, the vector y comprises variables with only two states of 1 and 0 repre- senting the HCC and non-HCC classes, respectively.Thus, extending the model in ( 3) is crucial to make it more efficient for our problem.The extension can be made by employing the statistical logistic regression method to model the probability of the target output.At the same time, we aim to keep the ℓ 1 regularizer to maintain the model sparsity and RVM property.Integrating the logistic model and sparse representation may enhance the feature selection performance.
The proposed logistic sparsity-based regression can be modeled using the Bayesian framework.In particular, the probability distribution of the target y i given the feature vector x i can be expressed as (1) www.nature.com/scientificreports/Here, σ (•) is the logistic sigmoid function.The prior distribution is introduced on the parameter θ using the Laplacian function given by Using the likelihood function in (4) and prior distribution in (5), we can estimate the parameters θ by the maxi- mum a posteriori (MAP).Note that maximizing the posterior function is equivalent to minimizing the negative of the log of this function, and thus we have the following optimization model: Problem ( 6) can be solved efficiently via proximal splitting methods 44 .By splitting, the first term in ( 6) is convex and differentiable and the second term is ℓ 1 -regularization which has a closed-form solution using soft-thresh- olding or shrinkage technique [45][46][47] .We detail the algorithm to solve Problem (6) in the supplementary document.
It is worth noting that the logistic sparsity regression (LSR) and LASSO are used for feature selection, but LSR is more suitable for HCC and non-HCC classification.LASSO primarily focuses on feature selection, whereas LSR excels in both feature selection and classification tasks due to its logistic regression foundation.Furthermore, LASSO's linearity in (3) does not consider the non-linearities in the relationship between the wavelet-radiomics features and target classes.In contrast, LSR's logistic formulation in (6) accommodates these complexities, enabling it to model intricate radiomics-target relationships.

Classification of HCC and non-HCC
To differentiate between HCC and non-HCC lesions, we can apply any binary classification technique to the selected radiomics features.In this study, we aim to investigate the effects of using radiomics features extracted from the wavelet domain and their combination with those extracted from the original CT image.Furthermore, we evaluate the effectiveness of the proposed logistic sparsity-based model for both feature selection and classification.Therefore, we consider different popular classifiers, including the proposed LSR, SVM, and MLP.
The proposed LSR model in ( 6) can be considered as an extension of the widely used LR classifier in machine learning (ML).Note that for liver disease prediction, the standard LR was one of the prominent techniques employed in several studies, including 11,14,48,49 .This study uses the LSR model for the radiomics feature selection and HCC and non-HCC classification.
The SVM classifier is a popular technique for solving classification, regression, and novelty detection problems.In the classification case, the SVM is a decision machine designed to map the training examples to the points in the feature space to maximize the margin between the categories.The key feature of the SVM is that its object function not only maximizes the margin between the two classes but also minimizes a measure of the error on the training set 50,51 .In liver lesion classification, the state-of-the-art results obtained by SVM have been reported in numerous studies, including 11,52,53 .
The MLP is a fully connected feed-forward neural network used extensively in classification and regression.Compared to the LR and SVM, MLP is capable of yielding more complex decision boundaries.A comprehensive introduction to the MLP is given in 54 .For liver lesion classification, MLP has been used in several works, including 31,55,56 .

Experimental results and discussion
This section presents the experimental results, performance analysis and comparison for the different important aspects of the wavelet-radiomics features-based approach to classifying HCC and non-HCC, including feature selection, the effect of combining different domain features, comparison of different feature selection models, analysis of using different wavelet families for optimal filter identification, and comparison with several deep CNN-based models.First, we give the experimental setup , then describe the results, analysis, and discussions on the study findings.

Experimental protocol
Radiomics features were extracted in both original CT and wavelet-filtered images.For the original CT domain, feature extraction were performed for all the three phases, which results in 300 features.For the wavelet-domain feature extraction, the Haar filter was employed with one-level wavelet decomposition, leading to a set of 2064 texture features.All the radiomics features were obtained using pyradiomics, a python package for extracting radiomics features from medical imaging 57 .
To evaluate the performances of the different models, standard performance metrics for binary classification problems were used, including F 1 score and the area under the curve (AUC) of receiver operating characteristic (ROC).The F 1 score is the harmonic mean of the precision and recall.The AUC is a performance measure that provides the capability of distinguishing between the classes at different threshold levels 58 .
To assess the statistical significance of the proposed method, this study utilized the independent two-sample t-test 59 and DeLong's test on the AUC measures.These statistical tests determine whether the disparity between the means is likely to have occurred by chance (null hypothesis) or is statistically significant (alternative hypothesis), based on the calculated p-value.A p-value (p) less than 0.05 signifies rejection of the null hypothesis at a (4) (5) p(θ ) = ( /2) N exp(− �θ � 1 ).www.nature.com/scientificreports/95% confidence level.In the context of our experiments, this implies a significant distinction in the AUC measures between our proposed methods and the compared methods.

Relevant feature selection: dominance of wavelet-derived features over original-domain features
This experiment aims to examine the proposed LSR model for feature selection and the contribution of different domain features to the selected subset features.In doing so, a total of N = 2364 features extracted from both the original and wavelet domains was used as input for feature selection.Since the feature selection was performed on the training set, these features represent the key characteristics for M = 226 training samples.The feature matrix X is therefore of size M × N = 226 × 2364, and the target vector y is of size M × 1 = 226 × 1.
To perform feature selection using the LSR in (6), it is vital to tune a suitable hyperparameter .This hyperparameter can be determined using grid or random search techniques, which are known to be computationally expensive.To overcome this limitation, we used a 10-time repeated fivefold cross-validation (CV) with Bayesian optimization (BO).BO is capable of providing a principled technique to direct the search for a global optimization function (maximizing AUC metric in our case).By building a probabilistic model for the objective function, the search is effective with an acquisition function to choose candidate samples for the next objective function evaluation.It has been shown in 42,43 that BO obtained better results in far fewer evaluations than its grid-search and random-search counterparts.
Figure 2 shows the performance metric AUC as a function of the hyperparameter .Here, the boundary search for this hyperparameter was initialized as [10 −6 − 10 6 ] .It can be observed that the BO technique was very effective in that it can lead the searching direction to the potential space containing the optimum value.Once the search was finalized, the optimal hyperparameter log( ) = 0.84 was found at the maximum value of AUC = 0.91.
Using the obtained hyperparameter, the proposed LSR model selected the features by solving Problem (6).After convergence, the parameter vector θ has only K = 29 nonzero coefficients out of the total of N = 2,364 entries ( K/N = 1.23% ).This means that the proposed LSR model selected only 29 features from a total of 2364 features extracted from both original and wavelet domains.Table 3 lists such the selected features corresponding to the non-zero coefficients.It can be observed that the features extracted from both the original and wavelet domains were selected.However, the texture features from the wavelet domain tend to dominate those from the original counterpart.The wavelet domain contributes 20 out of 29 features ( 68.97% ), while the original domain makes up 9 out of 29 features ( 31.03% ).This implies that the wavelet-domain features play a very crucial role in the HCC and non-HCC classification performance, especially when combined with those extracted from the original domain.
Furthermore, we find that all the three phases contribute to the selected feature subset.This 29-feature selected subset comprises 13 features from venous phase, 11 features from delayed phase, and 5 features from arterial phase.This result indicates that all the phases contain essential features necessary for screening HCC, and multiphase processing tends to be needed for performing HCC diagnosis.

Analysis and comparison of different models for radiomics feature selection
In this experiment, we examine the performances of the different feature selection techniques for HCC and non-HCC discrimination with three different classifiers, i.e. logistic sparsity regression, multi-layer perceptron, and support vector machine.Here, we considered four major approaches: filter-based, wrapper-based, dimensional reduction with PCA, and model-based techniques.For filter-based methods, we implemented two popular techniques, namely the feature variance thresholding (FVT) and the feature correlation thresholding (FCT), that have been employed in 20 for classifying HCC and hepatic hemangioma.Note that the FVT requires a pre-defined threshold τ v for pruning the features with variances smaller than τ v .Similarly, the FCT needs a pre-defined threshold τ c to identify the level of high-correlation between two features.In our experiment, we set τ v = 0.5 and τ c = 0.99 .The wrapper-based approach, on the other hand, sticks to an appointed model to rank the features.Here we tested the wrapper method using the LR and random forest (RF) as these models are efficient for important feature rankings.
For the model-based techniques, together with the proposed logistic sparsity regression model, we also evaluated its variants of logistic ridge regression and logistic elastic-net regression.While the logistic sparsity regression uses ℓ 1 -regularizer and the logistic ridge regression employs ℓ 2 -regularizer, the logistic elastic-net regression enforces both the ℓ 1 and ℓ 2 penalties on the model parameters.For comparison, we tested here the LASSO model employing ℓ 1 for sparse feature selection used for HCC identification 20 .Furthermore, we tested the widely-used PCA method, which reduces the number of features and retains the variance in the features.In our experiment, the PCA reduces the features but retains 99% variances.
Table 4 lists the results and performance metrics obtained by a 10-time repeated fivefold cross-validation for the different classifiers using the features selected by the different feature selection techniques.Here, all the feature selection techniques were performed on the training subset with the full 2,364 features extracted from both the original and wavelet domains.It can be observed that among the tested feature selection methods, the proposed logistic sparsity regression was the most efficient model.This model selected only 29 relevant features out of the total 2,364 features (1.23% of the full features), and yielded the highest F 1 and AUC metrics.The proposed logistic sparsity regression followed by the logistic sparsity classifier was found to obtain the highest F 1 score of 0.89 (95% CI 0.87-0.90),and AUC = 0.96 (95% CI 0.95-0.96).On the other hand, logistic ridge or elastic-net produced lower performance metrics though they selected many more features.The AUCs produced by the logistic ridge and elastic-net are 0.92 (95% CI 0.91-0.93),and AUC = 0.90 (95% CI 0.89-0.91),respectively.Compared with these techniques, the LSR model enhances the performance with statistical significance ( p < 0.0001 ).The LASSO produced good performances as it is a sparse feature selector.It yielded an AUC of 0.95 (95% CI 0.94-0.96).The proposed LSR enhances the AUC mean in comparison with LASSO, but not statistical significance ( p = 0.14 ) by the t-test.In contrast, PCA was found to be ineffective for extracting informative features for HCC and non-HCC classification.The wrapper-based methods with LR and RF using the 30 most important features obtained reasonable performance metrics.

Effects of wavelet-filtered features on classification performance
In this experiment, we aim to analyze the effects of using features extracted from different domains and their combinations on the performances of classifying HCC and non-HCC.To this end, we compare the prediction capabilities using three radiomics feature sets: (1) those extracted from both the original and wavelet domains, (2) those extracted from the original domain only, and (3) those extracted from the wavelet domain only.For each feature set, the important features are first selected by the best feature selection performer of the logistic sparsity regression, followed by the prediction using the different classifiers.
Table 5 depicts the performance metrics obtained using the three domain feature sets by the different classifiers evaluated on the training and test sets.On the training set, the model was evaluated with a 10-time repeated tenfold cross-validation.On the test set, the prediction was performed with a 200-time sampling replacement bootstrapping technique.This technique randomly draws data points with replacement from the original test set to create multiple test sets of equal size.For each of these bootstrapped sets, we assessed the model's AUC performance.This process yielded a distribution of AUC values, considering variations in the test set composition.From this distribution, we calculated performance metrics, including the mean AUC and a 95% CI.
The most noteworthy observation from the table is the improvement in the performance metrics obtained from the mixed-domain feature set, compared with those computed from the original-domain or waveletdomain feature sets.For instance, using the LSR classifier, the mixture of wavelet and original feature set has an AUC of 0.96 (95% CI 0.95-0.96),whereas the original radiomics feature set yielded an AUC of 0.92 (95% CI 0.91-0.93),and the wavelet-domain features only obtained an AUC of 0.90 (95% CI 0.89-0.91).The improvement Table 4. Performance metrics in terms of F 1 and AUC by the different classifiers using radiomics features selected by the different selection methods.The methods marked with asterisk ( * ) and/or plus (+) symbols indicate statistical significance compared to the proposed LSR method using the same classifier at a confidence level of 95%, as determined by the t-test and/or DeLong's test, respectively.The significant values, compared to the corresponding group in the first column, are in bolds. of the mixed-domain features over either the original domain or wavelet domain only is statistically significant ( p < 0.0001 ), confirmed by both the t-test and DeLong's test.Similarly, on the test set, the prediction results show that combining the wavelet-and original-domain radiomics features enhances the classification performance compared with using the original-domain features only.This improvement is statistically significant ( p < 0.0001 ), confirmed by DeLong's test, though the 95% CIs over- lap.However, the enhancement of the mixed-domain features over the wavelet-domain features is not statistically significant according to the t-test ( p > 0.05 ).

Effects of different wavelet-family radiomics features on classification performance
This section analyzes the effect of using different mother wavelet-based radiomics features on classifying HCC and non-HCC performance and aims to identify the most suitable wavelet family for this classification problem.We have followed 62 to select four different wavelet families for radiomics feature extraction, including Haar, Daubechies 7, Biorthogonal 6.8, and Reverse biorthogonal 6.8.For each family, we performed one level wavelet decomposition and radiomics feature extraction, and then used LSR for feature selection and classification.
Table 6 lists the results in terms of the subset of features selected, and performance metric AUC on the training and test sets.It can be observed that the proposed wavelet-based radiomics feature model is capable of yielding satisfactory classification performances, regardless of the wavelet families used.In addition, combining the wavelet and original-domain features considerably enhances the AUC metrics.This observation is consistent among all the tested wavelet families.For these wavelet transforms, on the test set, the Reverse biorthogonal 6.8 is the most suitable filter and yields the highest AUC of 0.89 (95% CI 0.84-0.94),followed by the Haar wavelet with an AUC of 0.87 (95% CI 0.82-0.92).According to the t-test and DeLong's test, the wavelet family of Table 5.The performance metric AUC obtained on the training set and test set by the different classifiers using radiomics features extracted from the different image domains.Here, the Haar wavelet filter was used for the extraction of the radiomics features.The domain features marked with asterisk ( * ) and/or plus (+) symbols indicate statistical significance compared to the mixed-domain features using the same classifier at a confidence level of 95%, as determined by the t-test and/or DeLong's test, respectively.The significant values, compared to the corresponding group in the first column, are in bolds.www.nature.com/scientificreports/reverse biorthogonal 6.8 enhances the performance among the tested wavelet filters with statistical significance ( p < 0.0001 ), even though their 95% CIs overlap.
For illustration, Fig. 3 shows the AUC of ROC obtained using the radiomics features extracted and selected from the different image domains followed by the different classifiers.It can be observed that employing the wavelet-domain radiomics features improves the AUC.Furthermore, combining the different radiomics feature domains leads to enhanced AUC compared to using either original or wavelet radiomics features only.This enhancement can be justified by the fact that the combination of the two domain features gives a more informative and discriminative representation of HCC and non-HCC lesions.

Performance comparison with deep CNN-based models
This section presents the performance comparison between the proposed wavelet radiomics-based model and other existing deep CNN-based approaches for addressing the problem of HCC and non-HCC classification.As the deep CNN-based models tend to yield unsatisfactory results in the cases of limited training data samples, for fair comparison, we consider here the deep CNN models using transfer learning techniques.The transfer learning techniques rely on the CNN backbones pretrained on other tasks and inherit the trained weights, i.e., knowledge transfer for solving a new task.The transfer learning techniques using several CNN backbones, including VGG 6 , ResNet-50 7 , and GoogleNet 8 are considered here.Furthermore, since recent 3D CNN models have shown to enhance the liver lesion classification performances, we implemented here the deep 3D CNN Table 6.The performance metric AUC obtained by using the different wavelet family filters for radiomics feature extraction from the different image domains.The wavelet family marked with asterisk ( * ) and/or plus (+) symbols indicate statistical significance compared to Reverse biorthogonal 6.8 with the corresponding feature domain at a confidence level of 99.99% ( p < 0.0001 ), as determined by the t-test and/or DeLong's test, respectively.The significant values, compared to the corresponding group in the first column, are in bolds.www.nature.com/scientificreports/model using 3D ResNet-18 architecture 9 .The Python code using the pre-trained deep CNNs is given in the supplementary document.Table 7 depicts the performance metrics in terms of the F 1 and AUC on the test set by the different models.It can be observed that the proposed wavelet-based radiomics feature model is capable of yielding satisfactory classification performances.It produced an F 1 score of 0.80 (95%CI 0.73-0.86).This score is comparable to those yielded by the deep CNN-TL using GoogleNet and deep CNN using 3D ResNet-18 methods.In terms of AUC, the proposed wavelet radiomics-based model considerably enhances the performance and yields the highest AUC of 0.89 (95% CI 0.83-0.93),followed by the deep CNN using 3D-ResNet-18 model with an AUC of 0.87 (95% CI 0.82-0.93),and deep CNN TL using GoogleNet with an AUC of 0.86 (95% CI 0.79-0.92).Figure 4 further illustrates the AUCs yielded by the different classification methods.Compared with the deep CNN-based models, the proposed wavelet radiomics-based approach yields higher AUC metrics with statistical significance ( p < 0.0001 ), as confirmed by both the t-test and DeLong's test.

Conclusion
This paper presented an analysis of using radiomics features extracted from multiphase CT images to address the problem of classifying HCC and non-HCC liver lesions.Through the experimental results, analysis and comparisons, the following significant findings can be drawn from this study.First, combining the wavelet-derived texture features with the original CT image features significantly improves the discriminative capability between the HCC and non-HCC lesions.Second, the proposed logistic sparsity regression with Bayesian optimization is capable of selecting compact and relevant radiomics features for HCC and non-HCC representations.The proposed logistic sparsity-based model is the most suitable feature selector among the tested feature selection counterparts and yields higher performance metrics in terms of AUC.Third, in the limited training data cases, the proposed wavelet radiomics-based features approach is comparable if not outperforms several recent deep CNN-based models used for HCC and non-HCC classification.
Table 7. Performance metrics in terms of F 1 and AUC by the different approaches based on radiomics features and deep CNN features for HCC identification.The methods marked with asterisk ( * ) and plus (+) symbols indicate statistical significance compared to the proposed method at a confidence level of 99.99%, as determined by the t-test and DeLong's test, respectively.The significant AUC value is in bold.

Figure 1 .
Figure 1.Examples of CT venous slices (top row) and the hepatic lesion annotation overlaid (bottom row) in out dataset.HCC tumors (red) are in the first two CT slices, and non-HCC lesions (cyan) are in the last two CT slices.Best view in color.

Figure 2 .
Figure 2. Bayesian optimization with a 10-time repeated fivefold cross-validation searching for the hyperparameter used in the proposed logistic sparsity-based model based on maximum AUC criteria; the optimal regularization strength hyperparameter log( ) = 0.84 was chosen at the maximum value of AUC = 0.91.

Figure 3 .
Figure 3. AUC ROC produced by the different classifiers using radiomics features extracted and selected from the mixed, original, and wavelet domains.Here the reverse biorthogonal 6.8 wavelet filter was used for the extraction of the radiomics features.

Figure 4 .
Figure 4. AUC ROC produced by the proposed radiomics features-based model and deep CNN-based approaches using different backbones for HCC and non-HCC classification.

Table 2 .
A summary of the training and test sets.

Table 3 .
The most relevant radiomics features selected by the proposed logistic sparsity-based regression model.These features correspond to the estimated dominant non-zero coefficients.