Computational learning of features for automated colonic polyp classification

Shape, texture, and color are critical features for assessing the degree of dysplasia in colonic polyps. A comprehensive analysis of these features is presented in this paper. Shape features are extracted using generic Fourier descriptor. The nonsubsampled contourlet transform is used as texture and color feature descriptor, with different combinations of filters. Analysis of variance (ANOVA) is applied to measure statistical significance of the contribution of different descriptors between two colonic polyps: non-neoplastic and neoplastic. Final descriptors selected after ANOVA are optimized using the fuzzy entropy-based feature ranking algorithm. Finally, classification is performed using Least Square Support Vector Machine and Multi-layer Perceptron with five-fold cross-validation to avoid overfitting. Evaluation of our analytical approach using two datasets suggested that the feature descriptors could efficiently designate a colonic polyp, which subsequently can help the early detection of colorectal carcinoma. Based on the comparison with four deep learning models, we demonstrate that the proposed approach out-performs the existing feature-based methods of colonic polyp identification.

Results before feature selection. Initially, classifications are performed for all the feature sub-sets for four datasets, viz. FV DB1−NBI , FV DB1−WL , FV DB2−NBI , and FV DB2−WL , without applying feature optimization. Results are shown separately for shape (Fig. 3) and NSCT (Fig. 4) for proper visualization. Accuracy is used as an assessment measure for two classifiers, namely LSSVM and MLP. For shape features, accuracy for all the datasets (displayed along Y-axis of Fig. 3) are in the range of 75% -90%. For NSCT, (pyrexc,pkva) filter has the best result with accuracy 91.83 % (for FV DB1−NBI ), 88.88% (for FV DB1−WL ), 91.30% (for FV DB2−NBI ) and 85.56% (for FV DB2−WL ) by using MLP. It is observed that 'pyrexc' NSPF works the best. This filter is derived from 1D using the maximal mapping function with two vanishing moments but exchanging two high pass filters. It can represent smooth edges efficiently. Further, 'pkva' NSDFB works better than 'sinc' as it is used to capture the high frequency content of the images like smooth contours and directional edges. It has the best PSNR performance. Due to this property, 'pkva' performs the best on real colonoscopy images, which has complex features to interpret. Five-fold cross-validation is applied for training and testing the models. For final accuracy, the average accuracy is considered after training and testing for five times. Between feature representation schemes F1 and F2, F1 is giving better accuracy than F2. For in-depth analysis of these feature vectors and filters along different classes, the ANOVA test is applied which is in the following section. whether feature values were affected significantly along with different classes and transform filters for two different feature representation schemes F1 and F2. To this end, ANOVA was conducted considering the feature values as dependable variables (DVs) and the two factors as fixed factors, namely classes and transform filters (only for NSCT). In a statistical test of a hypothesis, its significance value indicates how the result is different (or better) than that by null hypothesis. A small significance value (p< 0.05) indicates the reliable evidence against the null hypothesis. In this case, the two null ( H 0 ) and alternate ( H 1 ) hypotheses are: 1.     Table 1.
Observation based on class as a fixed factor As shown in Table 1, for feature representation scheme F1, class revealed overall influence of DVs to a highly significant level only for GFD and NSCT for both datasets. For feature representation scheme F2, no significant difference is showing along different classes for both datasets. For classification, the factor 'class' must reveal overall influence of DVs to a highly significant level. On this basis, we can conclude that only GFD and NSCT can be used for feature extraction. Moreover, feature representation scheme F1 is more efficient than F2 as obtained from the statistical significance test. It can be concluded that mean and variance is not sufficient to represent the transform coefficients.   www.nature.com/scientificreports/ Observation based on filters as a fixed factor There is no significant difference in feature values along different transform filters for NSCT. Thus, we have decided to consider the filter which is giving best accuracy in the classification (from Fig. 4).
The statistical significance of the transforms DWT, CRT, CVT and RT is checked for identifying degree of abnormality. The results showed in Table 1 indicate that they were statistically insignificant and feature values were not affected along with different classes. Hence, we can conclude that NSCT is a better transform than DWT, CRT, CVT, and RT for colonoscopy polyp feature study.
Observation based on correlation Further, we have applied Pearson's correlation (r) on feature vectors. r is a measure of the strength of the association between the two variables. Before considering the final classification, it is desirable to perform experiments on uncorrelated data for obtaining higher performances. Result of r values is displayed in Fig. 5. In this figure, highly correlated values are marked with black cells. The matrix representation will help to understand which feature vectors are highly correlated. As a result, we have found that GFD, and NSCT are not correlated but they are independent. However, F1 NSCT and F2 NSCT are correlated as expected because they are obtained from the same coefficients, only with different distribution fitting. Hence, we can conclude that GFD, and NSCT are independent with each other, and hence be considered for classification.
Final feature set design. Based on the statistical test ANOVA, we have selected the following feature sets for DB1 and DB2.
Shape feature representation using GFD as, Texture and color feature representation using NSCT (pyrexc,pkva) is as follows. F1 performs better than F2. That is why final feature representation scheme for NSCT is F1.
Shape, texture and color features are obtained from the following Since MLP classifier performance is better than LSSVM, therefore final classifier selected for the study is MLP (based on accuracy as selection measure).
Feature selection. Fuzzy based feature ranking is performed in this work. In doing so, fuzzy entropy based feature selection is applied for ranking the features. Based on the ranking, feature dimensions of size k = 6, 12, 18, 24, . . . , m have been designed, where m = 36 for GFD and m = 66 for NSCT. The minimum feature dimension is considered as 6, because the dimension, as lower than this value would yield low accuracy. Accordingly, classification using MLP is performed for all dimensional features. Finally, peak accuracy is selected, and corresponding dimensions are obtained. Graph of accuracy along different dimensional features are shown in Fig. 6 for shape and Fig. 7 for NSCT. Peak amplitude is displayed in the graphs with its corresponding dimension. There is a significant reduction in feature dimension, which is improving the overall accuracy of the classification. A comparative analysis is displayed in Fig. 8 where we can observe the reduction in feature dimension along four data sets. Change in accuracy using different feature descriptors GFD (for shape), NSCT www.nature.com/scientificreports/ (for texture and color) and GFD+NSCT (for shape, texture, and color) are shown graphically in Fig. 9. We can observe the increase in accuracy in all the cases, which proves that feature ranking play a major role in removing redundancy. The statistical significance of the final selected feature vectors have been performed using independent sample based t-test or student's t-test 30 . Here, sig ≤ 0.01 value indicates the '1% level of significance' and sig ≤ 0.05 indicates '5% level of significance' . The results are displayed in Table 2, where all the significant features are highlighted with the symbool '*' .
Results after feature selection. We use six measures, namely accuracy, sensitivity, specificity, precision, recall, f-measure, g-mean, to assess performance. All the results along with two different datasets are listed in Table 3. The results are highly satisfactory with GFD+NSCT feature, which is giving highest accuracy of 95.24% (for DB1-NBI) and 94.68% (for DB1-WL). It is also showing consistent performance in the downloaded database with an accuracy of 95.72% (DB2-NBI) and 94.66% (for DB2-WL). Other measures are listed in Table 3. This implies that shape, texture and color are critical in describing the feature of a polyp.
Comparison with the existing work. For fair comparison, we have selected only those existing works that mainly concentrated on the classification of polyps using specific features. We have selected DB2-NBI datasets for comparison and only two levels of classification (neoplastic and non-neoplastic). Four conventional machine learning methods were compared. The comparative results are listed in Table 4. In Approach 1 10 , Invariant Gabor Texture, Rotational Invariant LBP was used for 2D texture feature analysis, while for 2D color features they have investigated Color naming, Discriminative Color, Hue, Opponent, color GLCM. And 3D shape features were analyzed using Shape-DNA and Kernel-PCA. In Approach 2 12 , Invariant Gabor texture descriptors were classified using SVM, Approach 3 13 uses color GLCM with kNN as classifier. In Approach 4 14 , Bag of Words descriptors with Spatial Pyramid matching were applied. In Table 4, our proposed approach outperforms the four existing ones for colonic polyp classification. We did the second comparison with a deep learning work by Urban et al 21 (Approach 5 in Table 4). We have tested their architecture on the dataset DB2-NBI and achieved an accuracy of 92.6% with loss of 0.2234, indicating that our proposed work is efficient enough to generate result as deep learning. Note that the running/ training time of the proposed algorithm is much lower than the compared approaches. Also, our model have total control on the feature engineering process unlike the deep learning models where features are very difficult to explain, control and interpret.
As a statistical validation, we performed a statistical significance test (paired t-test) of those scores of each evaluation metric (e.g., accuracies, sensitivity, specificity, precision, recall, F-score, and G-mean) individually www.nature.com/scientificreports/ between our proposed method and each state-of-the-art method in pairwise manner and computed a statistical significance level (p value). If p value is less than 0.05, the corresponding pairwise comparative result in terms of that specified evaluation metric will be statistically significant (i.e., significantly win of our proposed method over that other method in terms of the specific evaluation metric). In summary, we obtained 11 significant wins among a total of 29 pairwise comparisons in our whole comparison experiments .  www.nature.com/scientificreports/ The proposed work is compared with four popular deep learning models, namely VGG16 31 , VGG19 31 , ResNet50 32 , and GoogleNet Inception V3 33 . These models are trained and tested with the limited dataset with augmentation. Augmentation method used are cropping, shearing, rotation ( 45 0 , 90 0 , 120 0 ) mirroring, skewing (left and right), inverting, and zooming. Different parameters considered are as follows -VGG 16/VGG 19/ ResNet 50: batch-size = 32, epoch = 12, ver-bose = 1. For fine tuned version, we have flattened the last layer, and fine-tuned with ReLU activation and adadelta optimizer. For GoogLeNet-V3 fine-tuned model, the hyperparameters in the trained model were: batch-size = 32, number-of-epochs = 12, sgd-learning-rate = 1e-4, momentum = 0.9, transformation-ratio = 0.05. Table 5 displayed the results recorded from the experiments performed on DB2-NBI. Here GoogleNet Inception V3 the hyperparameters in the trained model were: batch-size = 32, number-of-epochs = 12, sgd-learning-rate = 1e-4, momentum = 0.9, and transformation-ratio = 0.05. Here, for GoogleNet Inception V3 shows a satisfactory result with an accuracy of 94.79%. Our proposed method competes well with an accuracy of 95.7%. Although the deep learning model may work well, currently there are not enough large data for training. When sample (feature) size is not large, deep learning models are expected to lose power, leading to low prediction.

Materials and methods
Clinical information. In the recent years, there are many studies to explore the applications of computer vision in early colonic carcinoma detection. To this end, some researchers have generated own datasets while others have used publicly downloaded datasets. Their objectives are different: some have targeted only on localization of polyps, while others have concentrated on significant frame recognition, classification, among others.     www.nature.com/scientificreports/ Because of these diversities, comparison with the existing works is not convenient. Challenges of existing studies include unavailability of the generated datasets, difficulty in assessing the quality of the images, unavailability of ground truths, among others 10 . Keeping all these challenges in mind, we have performed our experiments on two datasets, one was generated and the other was a public dataset. We attempted to prove that the results obtained from the experiments are consistent between the two datasets. For comparison purpose, we have worked on benchmark of the publicly available dataset only. Dr. Kunio Kasugai prepared the generated dataset (DB1) at the Department of Gastroenterology, Aichi Medical University, Nagakute, Japan, following proper ethical protocols. All methods were carried out in accordance with relevant guidelines and regulations. Prior to colonoscopy examination, the consent of the patients from all subjects has be collected. For generating the dataset, Kasugai has used both Narrow Band Imaging (NBI) and high definition White Light imaging (WL) colonoscopy. Two videos were recorded per patient, one through NBI approach and the other through WL approach. Then he identified the keyframes where the polyp is appropriately visible. Only those significant frames are retained in the dataset and used in the experiments. Video frame extraction and polyp tracking are not considered in this study, as those are entirely different fields of work. Total images included in the study is shown in Fig. 2 along with a sample dataset. Only two classes are included for the study, namely neoplastic and non-neoplastic. This study has been approved by Aichi Medical University ethical committee (January 15, 2018; Approval No. 2017-H304). The collection of data was approved by the institutional review board of Aichi Medical University and the patients were given the opportunity to opt-out. All endoscopic images were de-identified prior to their inclusion in the data set.
The dataset2 (DB2) is downloaded from http://www.depec a.uah.es/colon oscop y_datas et/. The details of this dataset are given in Fig. 2a. Both the datasets are composed of frames taken from colonoscopy videos captured using two modalities, namely Narrow-band Imaging (NBI) and White Light (WL). Ground truths of all the images are prepared and verified by the experts.
Overview of the analytical approach. The overview of the proposed work is displayed in Fig. 1. This work has five major phases as follows.

Phase 1 -Dataset generation:
A dataset has been generated during this phase with two types of images, namely NBI (Narrow Band Imaging) and WL (White Light), and with two groups of images in each type, i.e. nonneoplastic and neoplastic. A brief description on the methods applied is included in the following sub-sections.

Shape descriptor GFD.
During polyp identification and classification, physicians always prioritize shape features among all others. Major factors include size of the polyp region, irregularity in the polyp, thickness, etc. Some good descriptors are always recommended in the digital domain to quantify the shape features. Shape descriptors mainly divided into two broad categories, namely contour-based and shape-based. Contour-based shape descriptor only concentrates on boundary information avoiding all shape interiors. However, region-based descriptors provide shape information based on all the pixels within a region, which makes it more favorable than the contour-based. In many publications, moments are found to represent the region-based shape descriptors. For two examples, Hu et al 34 proposed two-dimensional moment invariant and Kim et al 35 presented a descriptor which is invariant to rotation, scale, and translation. More works can be found in References [36][37][38] . Zernike moment is considered to be one such strong shape descriptor 37 , but it is unable to capture spectral features in the radial direction and does not allow multi-resolution analysis along with those directions. Note that, GFD is proved to be the best among all without redundant features and allows multi-resolution feature analysis along both radial and angular direction. This concept motivates us to use GFD as shape descriptor. In this work, GFD is extracted from the spectral domain by applying 2-D Fourier transform on polar raster sampled image. This work is motivated by the work of Zhang et al 39 which proved it to be better than MPEG 7.
For an image f(m, n) , Modified Polar Fourier transform can be obtained as follows: MPFT(̺, ψ) signifies the coefficients along ̺ radial frequency and ψ angular frequency respectively. In Eq. (1), 0 ≤ r = [(m − m d ) 2 + (n − n d ) 2 ] 1/2 and θ i = i(2π/T)(0 ≤ i ≤ T) ; (m d , n d ) represent center of mass of the shape; R and A represents radial and angular resolutions respectively. Here 0 ≤ ̺ < R and 0 ≤ ψ < A. This is followed by normalization to achieve rotation and scale invariance - www.nature.com/scientificreports/ where area denotes the area of the bounding circle of the polyp; x and y represent the maximum radial and angular frequencies considered, respectively. Since shape features are captured by lower frequencies, first 36 features are considered for the study with 4 radial and 9 angular frequencies. Therefore, final shape descriptor is: where I is the binary segmented image and polyp is the region of interest. ' || ' indicates the dimension of the feature vector. For 2D shape feature, a single frame is considered where the region of interest is clearly visible. This frame is selected by experienced doctors and also ground truth marking of that image is performed. Then, segmentation is performed on the ground truth image using K-mean algorithm. Figure 10 displays how the binary image I is prepared for shape feature extraction.
Texture and color descriptor. Texture and color descriptors help in quantifying the color changes and irregularity on the surface of the polyp. Multi-resolution transforms are chosen for texture and color extraction as they have the capability of representing images with fewer coefficients, which can be observed along with multiple scales, directions, and resolutions. They are localized in both time and frequency domain, whereas no prior segmentation technique is needed for analysis. Multi-resolution study presents a new direction for image representation on its solid mathematical base. During recent years, much effort has been made in designing directional representation of images such as Ridgelet 40 , Curvelet (CVT) 41,42 , Contourlets (CRT) 43 , NSCT 44,45 , Ripplet (RT) 46 and Shearlet (SH) 47 (and many more) under the concept of Multi-scale Geometric Analysis. Wimmer et al 11 have extracted the features using curvelet, contourlet, and shearlet transforms in application to colonic polyp detection. In the above mentioned papers, none have attempted to understand the statistical significance www.nature.com/scientificreports/ of the features, which raises questions on their false positive results. In this work, the statistical relevance of DWT, CRT, and RT features were analysed using ANOVA. The results were considered significant by 5% cutoff in the statistical analysis (Table 1). NSCT has the property of flexible multi-scale anisotropy, multidimensional expandability, full shift-invariance, and fast implementation 44,48 . It is an improved version of contourlet, where latter is not shift-invariant. The proposed work used the Cunha et al 48 algorithm to decompose the frequency plane into sub-bands. To this end, Nonsubsampled Pyramidal Filter (NSP) is applied to ensure multi-scale anisotropy and Nonsubsampled Directional Filter Bank (NSDFB) to provide multi-directional expandability. In performing NSCT, firstly an NSP split at each level generates a lowpass version and band-pass version of the image. The latter is obtained from the difference between the image and the prediction. Then, an NSDFB decomposes the high-pass sub-band into several directional sub-bands. The scheme is iterated repeatedly on the lowpass sub-band. A unit of NSCT is shown in Fig. 11. Different combination of pyramidal (pyr) and directional filters (dir) used for NSCT decomposition are {(pyr, dir) : (9 − 7, sinc)(9 − 7, pkva)(pyrexc, sinc)(pyrexc, pkva)} . We use NSCT implementation as described in Cunha et al 48 which is publicly available for coefficient extraction. Before decomposition, the image is converted from RGB to YCbCr where 'Y' represents the texture information, and 'Cb' and 'Cr' represents color information. This is performed to extract the non-uniform characteristic of the image, which is not possible using the RGB model. RGB model is capable of projecting uniform characteristics only. In addition, the 'Cb' and 'Cr' components are not correlated, which makes it more suitable than other models like HSV.
Feature representation for NSCT. Feature descriptors for NSCT are obtained by distribution fitting on the coefficients of each sub-band. Distributions used for this work are Generalized Gaussian Distribution (GGD) and Normal Distribution (ND). Then parameters obtained after the fitting are considered as feature descriptors.
NSCT coefficients generation Let us consider a function shown in following equation to explain the transform setup: The Eq. (7) decomposes the image I using 'pyr' pyramidal filter and 'dir' directional filter with decomposition level as 'decom_level ' , φ is the transform coefficients obtained. The decomposition level considered for NSCT is [1,2,2]. This indicates that there is three pyramidal decompositions and the number of directional decomposition in each pyramid level from coarse to fine resolution is 1, 2, 4, and 4, respectively. The output φ has cell vectors such that - [1]] indicates low pass sub-band image. 2. φ 2 = [φ [2,1], φ [2,2]] are high pass sub-band.(Obtained from 2nd level decomposition) 3. φ 3 = [φ [3,1], φ [3,2], φ [3,3], φ [3,4]] are four directional sub-band of the next finer pyramidal level. 4. φ 4 = [φ [4,1], φ [4,2], φ [4,3], φ [4,4]] are four directional sub-band of the finest pyramidal level. φ 3 and φ 4 are obtained from third level decomposition. Therefore, total sub-bands generated is 11(= 1 + 2 + 4 + 4) . The three-color planes of YCbCr results in 33 sets of transform coefficients. From each set of 33 coefficients, two feature descriptors are extracted (using two different method F1 and F2), which leads to a final feature vector of dimension 66. www.nature.com/scientificreports/ F1 GGD based feature representation. In this feature representation, the feature of an image I can be represented as follows where f φ Y j α represents the alpha parameter of the transform coefficients of jth sub-band using Y channel, and f φ Y j β represents the beta parameter of the transform coefficients of jth sub-band. Similarly, we have to extract parameters for the other two-color spaces. GGD is used in modeling when the concentration of values around both the mean and the tail behavior is of particular interest.
Here, α and β values are used as feature descriptors. The scale parameter α models the width of the probability distribution function (PDF) peak (standard deviation), while the shape parameter β is inversely proportional to the decreasing rate of the peak. We apply the Maximum Likelihood method to estimate the parameters α and β.
F2 First order statistics based feature representation. In this feature representation, the feature of an image I can be represented as follows where f φ Y j µ represents the mean of the transform coefficients of jth sub-band using Y channel, and f φ Y j σ represents the variance of the transform coefficients of jth sub-band. The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will follow a normal distribution. Every normal distribution is a version of the standard normal distribution whose domain has been stretched by a factor σ (the standard deviation) and then translated by µ (the mean value) where µ and σ value obtained after normal distribution fitting will represent f φ κ j µ and f φ κ j σ respectively, where φ k i , represents the coefficients of k th color channel.
Final feature sets and database design. Shape, texture and color features are analyzed in this study.
Shape feature is extracted using GFD ( f GFD ). Texture and color features are extracted using NSCT ( f NSCT ).
For NSCT, four different filters are used which generated the feature vectors f NSCT (9/7,sinc) , f NSCT (9/7,pkva) , f NSCT (pyrexc,sinc) and f NSCT (pyrexc,pkva) , subscript represents the filter names. For an image I, the feature vectors have been generated are as follows Where, We use two datasets, DB1 (generated) and DB2 (downloaded) ), each with two types: DB1 − NBI, DB2 − NBI (for NBI) and DB1 − WL, DB2 − WL (for WL). Therefore final feature vectors generated for the study are - Feature selection. Fuzzy entropy is an extension of Shannon entropy where the latter is a probabilistic method opposed to fuzzy entropy, which is a possibility-based method. In the fuzzy entropy-based process, fuzzy sets are used in entropy calculations. In this feature selection technique, the required membership of each sample along different feature dimensions is measured. In this work, Khushaba et al 49 method is used to estimate the membership function. Unlike feature selection techniques like Principal Component Analysis (PCA), Independent component analysis (ICA), etc., it does not fully transform the image features. Rather, it helps in ranking the features based on their contribution to an assigned class. This property helps in efficient monitoring of the features during feature reduction.
Let us consider a feature vector FV = {f 1 , f 2 , . . . , f l } , where l is the number of total features. Fuzzy membership value that kth vector will be in ith class iswhere ρ is the fuzzification parameter and ǫ is a value to avoid singularity and σ is the standard deviation. www.nature.com/scientificreports/ Normalization of the obtained membership values is performed to obtain a standard set of features. In case of total c numbers of classes, c numbers of fuzzy sets along each feature f need to be considered. Each of these reflects the membership degree in c problem classes. Fuzzy joint probability of a particular feature vector belonging to a class c can be given by the formula where P(f , c i ) gives the degree of contribution of a feature to a particular class. U i indicates the indices of the training vector that belong to class i and T indicates the dimension of feature vector. Joint fuzzy entropy of features of each class can be calculated as follows The complete fuzzy entropy can be obtained from Marginal entropy H(f) can be found as H(f ) = −P f S i logP f X i , where X i indicate the c numbers of fuzzy sets and P(f X i can be calculated as, Marginal class entropy H(C) = −P c i logP c i . Then mutual information (MI) between particular feature and class label can be calculated using formula In this fuzzy entropy based feature selection method, features are ranked based on increasing and decreasing value of MI based on the application. Classification. Two well-known classifiers, LSSVM 44,47,50 and MLP 51 , are used in this study. These two individual classifiers were selected because of their diversity in performance. All the classifiers were trained and tested individually for the two datasets. The above model is tested and trained with images of the datasets by applying the five-fold cross-validation technique 52 . Main objective of the work is to perform classification [53][54][55] where each level signifies the degree of dysplasia in colonic polyps.

Discussion
In this work, we propose a novel approach to quantify shape, texture, and color features for detecting the stages of dysplasia in polyps. Our work indicates that the shape features to be extracted by GFD, while texture and color features to be extracted by NSCT. ANOVA tests suggest that GFD and NSCT features are significantly different between the two classes (neoplastic and non-neoplastic). Among all the filters of NSCT, we have considered (pyrexc,pkva) filters. This is followed by feature optimization using fuzzy entropy based feature ranking algorithm, which could reduce the feature dimensions significantly and also increase the overall accuracy of the classifier MLP. An achieved accuracy of 95.24% in generated dataset and 95.72% in public dataset proved the efficiency of our work. The features obtained are highly efficient in distinguishing polyps irrespective of datasets. We also demonstrate that our method outperforms the four existing ones.
This work only focuses on single frames selected by the doctors. In future, we will integrate it with video tracking and automatic frame selection. The future work also includes the study of 3D features of the polyps. In addition, we can implement our analytical approach into a computational tool for easy use. Furthermore, in future, we will extend this framework with more computational strategies in terms of the analysing of the integrated multi-modal (multi-omics) data of biomedical image and next-generation sequenced array data together for various disease. www.nature.com/scientificreports/