A dynamic lesion model for differentiation of malignant and benign pathologies

Malignant lesions have a high tendency to invade their surrounding environment compared to benign ones. This paper proposes a dynamic lesion model and explores the 2nd order derivatives at each image voxel, which reflect the rate of change of image intensity, as a quantitative measure of the tendency. The 2nd order derivatives at each image voxel are usually represented by the Hessian matrix, but it is difficult to quantify a matrix field (or image) through the lesion space as a measure of the tendency. We conjecture that the three eigenvalues contain important information of the Hessian matrix and are chosen as the surrogate representation of the Hessian matrix. By treating the three eigenvalues as a vector, called Hessian vector, which is defined in a local coordinate formed by three orthogonal Hessian eigenvectors and further adapting the gray level occurrence computing method to extract the vector texture descriptors (or measures) from the Hessian vector, a quantitative presentation for the dynamic lesion model is completed. The vector texture descriptors were applied to differentiate malignant from benign lesions from two pathologically proven datasets: colon polyps and lung nodules. The classification results not only outperform four state-of-the-art methods but also three radiologist experts.

Malignant lesions have a high tendency to invade their surrounding environment compared to benign ones, thus the lesion growth rate has been widely used as a figure of merit (FOM) for computer-aided diagnosis (CADx) of the lesion, for example, of colon polyps 1,2 and lung nodules 3,4 . However, this FOM requires at least two measurements over a time period of weeks or even months apart, incurring additional cost, patient stress and risk of delayed treatment.
On the other hand, numerous image features have been extracted from the lesion volume, followed by sophisticated feature selection and classification operations for the same task of CADx [5][6][7][8] . A typical example of a CADx pipeline takes a few steps of (1) localizing the lesion in a medical image [this step can be computerized, called computer-aided detection (CADe)]; (2) segmenting the lesion volume; (3) extracting features from the lesion volume, and (4) classifying the features for lesion diagnosis. By expanding the pipeline to include not only medical images but also genetic data and more, many successful applications have been reported in the field of radiomics [9][10][11] .
Recently, great efforts have been devoted to construct various convolutional neural network (CNN) architectures to deeply learn the features directly from the input medical images and simultaneously classify the learnt features for the same ultimate objective, i.e. CADx of lesions [12][13][14][15] . The efforts have been further devoted to expanding the CNN architectures to learn not only the medical images but also the textures and more 16,17 .
To our knowledge, both the above research endowers for sophisticated feature extraction and classification operations and the recent CNN-based deep machine learning architectures have not explicitly considered the clinical observations that malignant lesions have high tendency to invade their surrounding environment compared to benign ones [18][19][20][21][22][23] . This paper proposes a dynamic lesion model and explores the feasibility of explicitly considering the high invading tendency for the task of CADx of lesions.
The proposed dynamic lesion model is based on a conjecture that the high tendency is related to the rate of image intensity changing at each image voxel inside the lesion volume. The rate is mathematically described by the 2nd order derivative operation, usually represented by the Hessian matrix at each image voxel 24 . Thus, the dynamic lesion model is represented by a matrix field within the lesion space. Extracting quantitative descriptors from the matrix field or matrix image to measure the tendency is mathematically challenging. To relieve this mathematical challenge, we have another conjecture below.
Based on the recent report that the eigenvalues and the eigenvectors of the Hessian matrix are mathematically related to each other 25 , we conjecture that the three eigenvalues contain most, if not all, the information of the Hessian matrix and, therefore, the three eigenvalues are chosen as the surrogate representation of the Hessian matrix. By treating the three eigenvalues as a vector, called Hessian vector hereafter, described by its two angular variables and one magnitude variable at each image voxel, we can obtain a vector image through the lesion volume. By adapting the well-known gray level co-occurrence (GLCM) computing method 7,26 to extract texture measures (or descriptors) from the vector image, followed by classification of the extracted vector texture descriptors, a quantitative description for the dynamic lesion model is completed.
Experimental results from pathologically proven lesion datasets demonstrate the superiority of the dynamic lesion model over state-of-the-art classification methods [26][27][28][29] . Particularly, the demonstration of outperformance of the presented dynamic lesion model over three experienced radiologists indicate the great potential of modelbased, task-driven, artificial intelligence (AI)-enabled CADx, which not only learns the experience that experts have learned, but also knows the task that experts are heading to. A similar demonstration of the great potential is the task-driven, AI-enabled AlphaGo 30 , which can outperform experts because it not only learns how to play the game from the experts, but also knows the task of occupying the largest space when taking each step further, while experts may not be able to achieve that task at every step.
The remainder of this paper is organized as follows. "Methods" describes the dynamic lesion model and the model representation by Hessian matrix and its eigenvalue vector at each image voxel, followed by the presentation of GLCM computation for extraction of the vector texture descriptors from the Hessian vector image. "Experiments and results" presents the experimental outcomes from two pathologically proven datasets, colon polyps and lung nodules, for the task of differentiating malignant from benign lesions with comparison to four existing state-of-the-art lesion classification CADx methods and three radiologist experts. Lastly, discussions and conclusions are drawn in "Conclusions and discussions".

Methods
As introduced above, this paper proposes a novel dynamic lesion model to perform the CADx task. Inspired by the clinical observation of different invasion properties for different lesion types, the model is deliberately designed to express the invading tendency using Hessian Matrix representation. In this section, we will describe the dynamic model in detail.

Dynamic lesion model and hessian matrix representation.
Our proposed dynamic lesion model is based on the observations that a malignant lesion has a high tendency to invade its surrounding environment compared to a benign one 20,21 . We hypothesize that the tendency for lesions to invade their surrounding environment is related to the rate of change of the image intensity values at each image voxel. The proposed method uses a Hessian matrix representation of the 2nd order derivative to model these changes in intensity.
The rate of the change is mathematically expressed by the 2nd order derivative operator of a scalar function or intensity image I = I x, y, z in three-dimensional (3D) space and could be represented by the Hessian matrix as follows 24 : where I xx , I xy , I xz , I yy , I yz , and I zz are the 2nd order derivatives of I x, y, z . To compute these partial derivatives of the intensity image, we use the well-established Deriche filters with parameter α = 1 as a default in the article 31 . The lesion model is then represented by a matrix at each image voxel through the lesion space, i.e. a matrix field or matrix image. Constructing image feature measures or descriptors from a matrix field for the task of CADx of lesions is difficult. To mitigate this difficulty, we adapt the traditional analysis of decomposing the Hessian matrix at each image voxel location.
Hessian eigenvalues as the surrogate representation of hessian matrix. Traditionally, Hessian matrix is frequently utilized to describe local image geometries, such as edges and corners in image processing 32,33 . Because of the 2nd order derivative operation on a scalar function, Hessian matrix reflects the rate of the scalar function changing, which describes not only the local geometric (shape) properties but also the local moving tendency. While Hessian matrix contains very useful local geometric and dynamic information of a lesion in a medical image, it is very challenging to directly utilize it to construct image feature descriptors for the task of CADx of lesions. To combat this challenge, we adapt the traditional approach of decomposing the Hessian matrix as follows 24 : where 1 ≥ 2 ≥ 3 are three eigenvalues, v 1 ,v 2 , and v 3 are their corresponding eigenvectors, respectively, which are all orthogonal with each other and form three new basis in the 3D Euclidean space.    www.nature.com/scientificreports/ Based on the recent report that the eigenvalues and the eigenvectors are mathematically related to each other 25 , we can choose either the three eigenvalues or the three eigenvectors to construct image feature descriptors. Since the three eigenvalues contain fewer variables than the three eigenvectors and have advantages in computation, the eigenvalues would be a better choice.
Furthermore, the three eigenvalues can be viewed as the projections of a vector on the three orthogonal eigenvectors, respectively, and the vector contains not only the strength of motion tendency (i.e. the vector magnitude), but also the direction of motion tendency (the vector orientation). Thus, we have another conjecture that the vector, made from the three eigenvalues and called Hessian vector hereafter, is a surrogate representation of the Hessian matrix.
Based on the two conjectures above, our dynamic lesion model is then represented by the Hessian vector. For the CADx task, we will turn to how to extract quantitative measures about the lesion tendency from the vector field.
Vector-based co-occurrence matrix or vector texture image. From the decomposition of Hessian matrix, one image voxel in volumetric data could produce three eigenvalues which could be formed as a local vector (or Hessian vector), i.e. ( 1 , 2 , 3 ) . This 3D vector image could be represented at each image voxel by: where x, y, z is the coordinate in the original image I x, y, z , and the values m and M are the minimum and maximum eigenvalues among all the image voxels. Thus, each image voxel inside the Hessian vector field is specified by its three values of ( 1 , 2 , 3 ) of Eq. (3) within the range of Eq. (4). Figure 1 illustrates that V x, y, z should be a Hessian vector in a new local 3D Euclidean space, defined by its three orthogonal eigenvectors, i.e. v 1 , v 2 , and v 3 .
As a vector, V x, y, z could be expressed by the spherical coordinates as: where | | is the magnitude, θ and ∅ are two angles representing the vector direction as shown in Fig. 1. For simplicity in presentation, | |= will be assumed hereafter. The Hessian vector of Eq. (5) is defined by the three variables as follows: (3) V x, y, z = 1 x, y, z , 2 x, y, z , 3 x, y, z , www.nature.com/scientificreports/ Since , θ and ∅ are three geometric measures, they are independent with any coordinates even if they are expressed by 1 , 2 , and 3 which are three functions expressed by ( x, y, z ). So , θ , and ∅ are three very stable local metrics, which are more suitable to describe local properties in lesions.
Given the Hessian vector images of Eqs. (5)-(8) from the acquired intensity image, I = I x, y, z , our task is to find or develop a computing method to extract quantitative measures from these vector images.
The magnitude ( ), azimuth angle ( θ ) and polar angle ( ∅ ) form another equivalent Hessian vector of Eq. (3) which produces the vector image, represented at each image voxel by: where the minimum value m and the maximum value M are determined by Eq. (4), given the acquired image I x, y, z . Thus, each image voxel inside the Hessian vector field is specified by its three values of ( , θ, ∅) of Eq. (9) within the ranges of Eq. (10), respectively.
Inspired by the well-known GLCM texture descriptor for CADx of lesions 7,26 , our task here is to find an adequate number of gray levels for each of the three ranges of Eq. (10). Then, GLCM measures can be computed from the vector image of Eq. (9).
Digitalization of the three variables in their ranges of Eq. (10), respectively. Let Q a and Q b denote the numbers of gray levels of θ in the range [0,2π] and φ in the range [0,π ], respectively. The digitalization of the angular variables θ and ∅ in the vector image domain is given by: where ⌊X⌋ is defined as the infimum of X 28 .
Due to the scale difference between the angular variables ( θ , ∅ ) and the magnitude variable ( ), we introduce a scale mapping parameter τ , where τ is an integer. Let Q c denote the number of gray levels of ρ in the range [ ρ m , ρ M ] and �ρ = ρ M − ρ m . The digitalization of the vector magnitude ρ is given by: Thus, we obtain the gray level vector image from the acquired intensity image, I = I x, y, z , as follows: Our next task is to compute the GLCM-alike measures from this gray level vector image.
Computation of vector-based co-occurrence matrix. Given the above gray level vector image of Eq. (14), we present in this section a method which utilizes derivative-based vector to define a new type of co-occurrence matrix (CM), called vector-based CM or VCM and expressed as: where T(l, m, n) represents the digitalized gray level vector images in 3D space, (L,M,N) indicate the 3D volume size, V 1 and V 2 are a vector pair in T(l, m, n) , and (�l, �m, �n) is the offset from T(l, m, n) to T(l + �l, m + �m, n + �n) , indicating a direction. Figure 2 illustrates the calculation of Eq. (15).
How to find all offsets around one concerned voxel is one of several key steps in the calculation of VCM. It is well known that the nearest neighbors have the closest relevance to the concerned voxel in topology and statistics. Therefore, the 26 nearest neighbor voxels around the concerned voxel are chosen as our candidates to obtain the offsets, i.e. (�l, �m, �n) , in Eq. (15). These offsets can be viewed as the data sampling in the 3D vector image domain along the 26 directions defined by the 26 nearest neighbors 34 .
Due to the limited sample size of pathologically proven medical image datasets, the RF classifier (and all machine learning algorithms) are susceptible to bias from the input data of limited sample size, depending on how it is divided into training and testing data subsets. To reduce this bias effect, 100 randomized groups of lesions are generated, and each group is divided into training and testing sets. The division is made such that there is an equal number of each class (or lesion label) in both the training and testing sets.
We employ a feature selection algorithm from the R package that first reads in the entire list of the vector texture features and generates an importance value for each variable in the feature set based on the GINI index 36 . We use a forward step feature selection (FSFS) method 7,38 to perform lesion classification, where the features are added iteratively to that algorithm in decreasing importance order, such that the best performing features are added first to optimize performance. This procedure is repeated for each of the 100 randomized groups of testing and training datasets. The results are evaluated for each group and for each number of features based on the area under the curve (AUC) of receiver operating characteristics curve (ROC) with average over the 100 groups. Results of the above 100 repeated two-fold cross validation experiments are shown in the following section.

Experiments and results
In this section, we first describe the clinical datasets, which will be used to test the proposed lesion model, and evaluate two parameters associated with the digitalization of gray levels and the scale mapping of the magnitude of the Hessian vector to obtain some preliminary results for further experiments. Then lesion classification is performed on the extracted vector texture features or VTFs. At the end, we compare our classification results with four existing CADx methods and three radiologist experts.
Polyp and nodule datasets. Polyp dataset. A total of 59 patients including 51% males and 49% females, who were scheduled for clinical colonography examination, were recruited to this study under informed consent after approval by the IRB. Their ages range from 45.9 to 91.6 years old (mean age of 66.5 years old). The patients were scanned by a routine clinical non-contrast CT colonography (CTC) scanning protocol covering the entire abdomen volume prior to the clinical colonography examination. Tube voltage was 120 kVp and dose was deter- Pulmonary nodule dataset. A total of 66 patients including 52% males and 48% females, who were scheduled for CT-guided lung nodule needle biopsy, were engaged in this study under informed consent after approval by the IRB. The average age of the patients is 69.5 years old, ranging from 33 to 91 years old. A total of 68 lung nodules with 20 benign and 48 malignant were biopsied under the CT scanning with 120 kVp tube voltage and automatic exposure control. The diameter of these nodules ranges from 0.91 to 13.08 cm (mean size of 3.15 cm). Each CT scan covers a portion of the patient entire chest volume, resulting in 100-200 image slices of 512 × 512 array size, and each image voxel is nearly cubic with edge size of 1 mm. The border of each nodule image slice inside the volumetric patient CT scan was also drawn by experts on a slice-by-slice manner using a semiautomated segmentation algorithm.
Experimental results from polyp dataset. To calculate the VCMs of polyps, the nearest 26 neighbors around the concerned voxel were used to determine the 13 independent offsets or directions which were described in "Vector-based Co-occurrence Matrix or Vector Texture Image". According to the digitalization scheme of Eqs. (11)-(13b), the total gray levels are equal to Q a × Q b × Q c . To study the number of total gray levels (i.e. Q a , Q b , Q c ) as a parameter in calculating the VCM, fifteen different total gray levels were considered in our experiments, and the parameter varies in the range of {16, 24,32,40,48,56,64,72,80,88,96,104,112, 120, 128}. After integer factorization, these 15 total gray levels will generate 235 different combinations of Q a , Q b , andQ c as listed in the Appendix of this paper. These combinations will produce 235 feature sets which will be fed to the RF classifier as training and testing datasets. The parameters which produced the best outcome were then chosen as the total gray levels for the digitalization scheme. The parameters of (2, 2, 32) and (2,8,6) for ( Q a , Q b , Q c ) were chosen in the experiments for classification of polyp and nodule datasets respectively. The above trial-and-error empirical method for parameter determination was also applied to determine the parameter τ for the scale mapping of the vector magnitude in Eq. (13a). We tested this parameter in the range of 1, 2, …, 6. The range of 2, 3, 4, 5 is reasonable choice. τ = 5 was chosen in the experiments for classification of our polyps and τ = 2 is selected for our pulmonary nodules.
Once the two parameters were determined, the following steps are taken to generate the final classification results. For the 63 polyps of limited sample size, including 31 benign and 32 malignant ones, we designed three testing schemes, aiming to perform a thorough investigation. We performed the two-fold cross validation as described in the method section above. To obtain robust results, 100 observation groups were randomly selected from the dataset. For each observation group, we selected 31 polyps as the training set and the remaining 32 polyps as the testing set. The experimental set up is presented in the first row of Table 1. After classification, we evaluate the results with four widely used metrics, i.e. AUC, Accuracy (Acc), Sensitivity (Sen), and Specificity (Spe). The classification results, in terms of average and standard deviation from the 100 observation groups, are shown in the first row of Table 2.
Experimental results from nodule dataset. VCM and VTF calculations of the pulmonary nodules share the same steps as the VCM and VTF calculations of the colon polyps, i.e. going through the 13 independent offsets or directions and performing the digitalization scheme to obtain the 235 different combinations of Q a , Q b , and Q c for the corresponding 235 features. By inputting these 235 features to the RF package and going  www.nature.com/scientificreports/ through the GINI-based feature sorting and the FSFS-based feature selection 7,38 , the two parameters (one is the total number of gray levels and the other is the scale mapping of the vector magnitude) will be determined. Once the two parameters were determined, the following procedure is taken to generate the final classification results, similar to the procedure in generating the final classification results of the polyps. In this classification, the 68 nodules, including 20 benign and 48 malignant, were stochastically divided into 100 groups of training and testing datasets with twofold cross validation as illustrated in the second row of Table 1. The final classification results, in terms of the average and standard deviation over the 100 groups, are shown in the second row of Table 2. Compared to the polyp data results of the first row in Table 1, it is very interesting to note the gap of 8% between the sensitivity and specificity in the very unbalanced nodule dataset (of 20/48) vs. the gap of 4% in the nearly balanced polyp dataset (of 31/32). This may reflect the robustness of the lesion model for unbalanced datasets.
Comparison to other CADx methods. For reference on how well our proposed dynamic lesion model performs, we compare our results to some well-known texture extraction and classification methods and a stateof-the-art CNN deep learning architecture on both the polyp and the nodule datasets as follows.
• Haralick texture method 26 with extended measures (eHM) 7,35 . The Haralick method is widely cited for the GLCM texture descriptor in the field of CADx. • HoG3D 27 -this method counts the occurrences of gradient orientation (i.e. only the angular variables of a vector) in some cropped portions of the original intensity image and generates some histograms which are joined to form gradient features. • CoLIAGe (co-occurrence of local anisotropic gradient orientation 28 -this mothed also employs the gradient angles to extract the entropy of every local patch (or a local voxel group) to form texture features by two joint histograms. • VGG16 29 -it is a widely cited CNN deep learning method. It was implemented by the following architecture: total of 20 salient slices were extracted from each lesion volume and inputted to the VGG16 pipeline for lesion feature extraction and classification. Figure 3 shows the ROC curves of the Hessian vector representation of our lesion model with comparison to the above four CADx methods over the 63 polyps. These ROC curves demonstrate that VTF based polyp classification is the best one among the four state-of-art methods.
In addition to the visual assessment via ROC plots, we further computed the four evaluation metrics. Table 3 shows the comparison results among these four different methods and our proposed lesion model over the 63 colon polyps, where the same RF classifier was used for eHM, HoG3D and CoLIAGe. Our proposed lesion model performed the best. We can see quite clearly that when compared to the eHM, the Hessian vector derived texture features outperformed the Haralick texture features with an AUC of 0.982 against 0.876. Compared against the gradient-based features of HoG3D and CoLIAGe, our proposed vector-based texture features improved the performance substantially over the HoG3D features (AUC = 0.804) and the CoLIAGe texture features (AUC = 0.923). The gain of our lesion model is also substantially higher over the VGG-16 outcome (AUC = 0.833). www.nature.com/scientificreports/ Figure 4 shows the ROC curves of the Hessian vector representation of our lesion model with comparison to the above four CADx methods over the 68 nodules. The ROC curves in this figure illustrate that VTF based lung nodule classification is much better than eHM, HoG 3D, ColIAGe and VGG16. Table 4 shows the comparison results among different methods over the 68 lung nodules, where the same RF classifier was used, except for the deep learning VGG16 which performs feature analysis and classification simultaneously. Our proposed lesion model outperformed all the comparison methods, rendering a similar outcome as that from the 63 polyp dataset. When compared to the eHM, our Hessian vector derived texture features outperformed the Haralick texture features with an AUC of 0.861 against 0.792. Compared against the gradient based features, our proposed vector texture features improved the performance substantially over the HoG3D features (AUC = 0.671) and the CoLIAGe texture features (AUC = 0.654). The gain is also substantially higher over the VGG-16 outcome (AUC = 0.669). Table 3. AUC, accuracy, sensitivity, and specificity values over the 63 colon polyps from the comparative methods and our proposed lesion model.   www.nature.com/scientificreports/ In addition to the comparisons by the visualization with the ROC plots and the quantitative measures with the four metrics of AUC, Acc, Sen and Spe, we further performed experiments to show the statistical significance among all the CADx methods. Using the Wilcoxin ranked t-test, we obtained the quantitative measures for significant difference between the results of our lesion model or Hessian vector representation and the four reference methods. Table 5 shows the P-values between our Hessian vector model and the four reference methods, indicating that the classification performance of our lesion model is statistically significantly better than the performances of the four reference methods for both the polyp and nodule datasets (p-value ≪ 0.05).
Comparison to human observers. Inspired by the competitive game of AlphaGo vs. Expert 30 , we invited three radiologists to score on the above 59 patients' CTC images, each includes the entire abdomen volume. A total of 63 polyps are embedded inside the 59 CTC images. The human experts are experienced radiologists with 15-35 years of clinical practice in their field. They had all the visualization tools in current clinical setting to exam each polyp, and achieved AUC values of 0.869, 0.926, and 0.960, respectively 39 . Our dynamic lesion model, called AlphaPolyp here, outperformed all three radiologists by AUC score of 0.986. By this comparison study, we have gained the insights below about why both AlphaGo and AlphaPolyp can outperform the human experts.
For the AlphaGo, it not only learns how to play the game, but also knows the task of occupying an area as large as possible. Given a current situation for next move, AlphaGo can make the next move to occupy the largest possible nearby area while human expert may not be able to achieve that task every time. For our AlphaPolyp design, the task-driven AI-enabled algorithm is built on the clinical observations (i.e. the human experts' learning process) and is trained for the task of matching the pathological outcome of the true lesions. The AlphaPolyp will take all the learnt observations into consideration to predict the pathological outcome while the human experts may not be able to consider all the observations to predict the pathological outcome.

Conclusions and discussions
A dynamic lesion model was proposed to consider the clinical observation that malignant lesions have a high tendency to invade their surrounding environment compared to benign ones. To mathematically describe the lesion model, one conjecture was made that the lesion growing or invading tendency is related to the rate of image intensity changing at each image voxel, and the rate is represented by the 2nd order derivatives at that voxel, leading to a field of Hessian matrix across the image array space. To circumvent the difficulty of extracting quantitative measures from a matrix field for the task of CADx of lesions, another conjecture was made that the three eigenvalues of the Hessian matrix can be a surrogate representation of the Hessian matrix, and further be treated as a vector in the orthogonal system of the eigenvectors, i.e. Hessian vector.
Inspired by the well-known co-occurrence texture measures for the CADx task, we extracted the co-occurrence vector texture features, i.e. VTFs, from the Hessian vector field or image. Classifying the VTFs for the CADx task on two pathologically proven lesion datasets of polyps and nodules provided striking results, outperforming both the state-of-the-art CADx methods and radiologist experts.
While the VTF was demonstrated as a good feature descriptor of the dynamic lesion model, other vector feature descriptors can be explored for the CADx task. This is one of our future research topics.
While the Hessian vector was demonstrated as a good surrogate of the Hessian matrix, use of the Hessian matrix for generation of matrix feature descriptors for the CADx task is desired. This is another topic of our future research endeavors.
More importantly, by the insights on why AlphaGo can outperform the human experts, we believe task-driven AI-enabled CADx systems would be a direction of our future research endeavors [40][41][42] .
It can be seen that the evaluation study of the VTF descriptor for the dynamic lesion model is limited by the small sample size of both the polyp and the nodule datasets. We have been devoting significant effort to continually acquire more pathologically proven lesion datasets, although the data acquisition is highly costly.