Skin lesion classification of dermoscopic images using machine learning and convolutional neural network

Detecting dangerous illnesses connected to the skin organ, particularly malignancy, requires the identification of pigmented skin lesions. Image detection techniques and computer classification capabilities can boost skin cancer detection accuracy. The dataset used for this research work is based on the HAM10000 dataset which consists of 10015 images. The proposed work has chosen a subset of the dataset and performed augmentation. A model with data augmentation tends to learn more distinguishing characteristics and features rather than a model without data augmentation. Involving data augmentation can improve the accuracy of the model. But that model cannot give significant results with the testing data until it is robust. The k-fold cross-validation technique makes the model robust which has been implemented in the proposed work. We have analyzed the classification accuracy of the Machine Learning algorithms and Convolutional Neural Network models. We have concluded that Convolutional Neural Network provides better accuracy compared to other machine learning algorithms implemented in the proposed work. In the proposed system, as the highest, we obtained an accuracy of 95.18% with the CNN model. The proposed work helps early identification of seven classes of skin disease and can be validated and treated appropriately by medical practitioners.

www.nature.com/scientificreports/ the lesion, as well as modified for visual contrast and color reproduction, at first. Each image and patient had seven features, namely, age of the patient, sex of the patient, lesion id which is a unique identifier for a particular type of lesion, image id which is a unique identification number for an image, dx type for technical validation, Skin lesion's geographical location, and a diagnostic skin lesion category which is a classification of skin lesions that can be used to diagnose a condition. The patients were mostly between the ages of 35 and 70. The ground truth of the data set was represented by the technical validation field category, which revealed how the skin lesion diagnosis was made. Ground truths were divided into four categories by the researchers, namely, Histopathology, Confocal, Follow-up, and Consensus. In the Histopathology category, dermatopathologists diagnosed excised lesions histopathologically. All images were manually evaluated with the relevant histopathologic diagnosis and confirmed for plausibility by the researchers. In the Confocal category, the reflectance confocal microscopy method is used that provides near-cellular resolution, and it was used to confirm the presence of some benign keratoses on the face. In the Follow-up category, the researchers recognized images as proof of biological benignity, if nevi examined with digital dermoscopy confirmed no modifications during 3 follow-up visits or 1.5 years. The consensus category consists of normal benign instances with no follow-up or histology, as well as examples in which two experts have given the same unequivocal benign diagnosis.
Histopathology was used to diagnose more than half of the skin lesions. The back, lower limbs, and trunk are all significantly impacted skin cancer locations, as demonstrated in the data set's localization distribution. In terms of diagnostic skin lesion categories, the data set included seven different classes. Figure 1 depicts a selection of sample images from the HAM10000 dataset for each of the classes. The following are the seven categories: The proposed work is organized as follows: Section "Literature survey" contains information on similar research in this field. Section "Proposed methodology" details the methodology performed on the data. Section "Results and discussion" discusses all of the model's performances; Section "Conclusion and future work" contains the conclusions derived from the research and the future work and next steps for this project.

Literature survey
Dhivyaa et al. 1 have proposed a model that can produce feature maps of high resolution that can be used to assist in the preservation of the image's spatial information. On two separate datasets, the authors proposed Random Forest and Decision Tree algorithms for skin class categorization. Polat, Kemal, and Kaan Onur Koc 5 have proposed a system that uses no filtering and feature extraction. Authors claim to have obtained very encouraging results in the identification of skin lesions. Kumar et al. 6 have proposed a system with RGB color-space, GLCM, and Local Binary Pattern (LBP) methods for pre-processing and image segmentation that uses fuzzy-c clustering and obtained encouraging results in the identification of skin lesions. Adegun and Viriri 7 have proposed a system with an encoder-decoder Conditional Random Field (CRF) module for contour refining and lesion boundary localization that uses a linear combination of Gaussian kernels for paired edge potentials. A Fully Convolutional Network (FCN) with hyper-parameters tuning gave good results. Data augmentation strategies have been proposed by Srinivasu et al. 8 to balance various forms of lesions to the same range of images. The proposed model, which is predicated on the LSTM and MobileNet V2 approaches, was found to be effective in classifying and detecting skin diseases with little effort and computational resources. When using CNN transfer learning, Mahbod et al. 9 confirmed that image size affects skin lesion categorization performance. They also showed that image cropping outperforms image resizing in terms of performance. Finally, the best classification performance is demonstrated using a simple ensembling strategy that merges the findings from images clipped at six scales and three fine-tuned CNNs. Zhang et al. 10 presented an efficiency result of CNN that was optimized using an upgraded version of the whale optimization technique. This technique is used to find the best weights and biases in the network to reduce the difference between the network output and the desired output.
Hameed et al. 11 proposed a Multi-Class Multilevel (MCML) classification technique inspired by the "divide and conquer" strategy. The proposed classification algorithm combines machine learning and deep learning methods. Hasan et al. 12 proposed the DSNet, an automatic semantic segmentation network for skin lesions. To reduce the number of parameters and make the network lighter, they used a separable depth-wise convolution. Hosny et al. 13 used pre-trained AlexNet with transfer learning. As initial values, the parameters from the original model were used. and the weights of the last three replaced layers were randomly initialized. ISIC 2018, the most recent public dataset, was used to test the suggested technique. Chatterjee et al. 14 used SVM with RBF to identify three lesions by extracting form, fractal dimension, texture, and color variables using fractal-based regional texture analysis (FRTA). Pereira et al. 15 proposed integrating gradients with the local binary patterns (LBP) technique to increase the performance of skin lesion classification algorithms and to further exploit the border-line properties of the lesion segmentation mask.
A kernel sparse coding approach for both segmentation and classification of skin lesions 16 . A skin lesion detection system was proposed by Garcia-Arroyo et al. 17 with fuzzy histogram thresholding, lesions were segmented. Using the ABCD rule, Zaqout 18 has developed a model capable of partitioning and classifying skin pictures. In the majority of these efforts, the ABCD rule is followed, which includes image pre-processing, segmentation to locate the lesion, feature extraction, and classification. TDS is a dermoscopy score that assists in the diagnosis of the lesion condition. Khan et al. 19 have proposed a system that uses the deep learning model MASK-RCNN for segmentation and pre-trained DenseNet for classification. Using YOLOv5, Shelatkar et al. 20 describe a deep learning-based method for classifying and identifying brain tumours. To extract the features, a transfer learning concept is used which is then exposed to selection and classification stages. Khan et al. 21 developed an automated method for skin lesion categorization that used pre-trained RESNET-50 and RESNET-101 deep neural network (DCNN) with transfer learning was employed for feature extraction and optimal feature selection based on the kurtosis-controlled principle component (KcPCA). The information was then combined and the best features were chosen, which were then fed into a supervised learning algorithm-SVM of kernel function radial basis function (RBF) for classification.
Khan et al. 22 developed a segmentation and classification framework based on deep learning. For skin lesion segmentation, a MASK R-CNN-based architecture with a Resnet50 feature pyramid network (FPN) is used. The final mask is then generated by mapping connected layer-based features. A 24-layer convolutional neural network architecture is built during the classification phase, with activation based on the display of higher characteristics. Finally, the best CNN features are delivered to softmax classifiers for final classification. Harris Hawks Optimization with Deep Learning Model for Detection of Diabetic Retinopathy was proposed by Gundluru et al. 23 . Tajeddin et al. 24 used highly discriminative characteristics to classify skin melanoma. For lesion segmentation, the authors started with contour propagation. To extract features, lesions were mapped in log-polar space using www.nature.com/scientificreports/ Daugman's transformation based on the peripheral area. Finally, the various classifiers used in the proposed work were evaluated.
To construct a dermoscopic skin image recognition system, Yu et al. 25 recommended CNN and the local descriptor encoding approach. To extract skin lesion features from images, the authors utilized ResNet101 and ResNet50. Using a Fisher vector (FV) and the collected ResNet features, a global image representation was generated. Finally, a Chi-squared kernel was applied in an SVM for classification. Melanoma was categorized into three groups based on the thickness of the lesion 26 . The researchers utilized two categorization schemata: one classified lesion as thin or thick, and the other separated them into thin, moderate, and thick categories. To categorize the lesion data, a combination of ANN and logistic regression algorithms is used. Using a cloud computing system, Rajput et al. 27 suggested diabetes diagnosis to Indians living in rural locations. To construct a breast cancer detection system, Abbas et al. 2 proposed the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) to present a unique approach called BCD-WERT for efficient feature selection and classification. To detect ships from satellite imaginary, deep CNN with YOLOv3 for object detection with SHA-256 hashing for security to the detected images 3 . For the purpose of diagnosing heart disease, Reddy et al. 28 suggested a hybrid genetic algorithm and a fuzzy logic classifier.
To conclude, many researchers have contributed to classifying the skin lesion categories using distinct machine learning and deep learning approaches. Also, they have worked on a variety of data sets. The proposed work mainly aims at classifying the skin lesion categorization on the HAM10000 dataset, which is used by a few researchers. The proposed work concentrates on categorizing the skin lesion into seven classes using different machine learning and Convolutional Neural Network technique and obtained comparatively better results.
The main contributions of the proposed work. The HAM10000 dataset images are highly unbalanced. For instance, in the dataset, we observe that Dermatofibroma (df) skin lesion class has a count of 115 images which is the smallest size. The advantage of the proposed work is that we developed a model which is computationally much efficient as compared to existing work as they have considered the entire dataset. This is because a majority of the existing work has considered the entire unbalanced dataset and then they have augmented it to make sure the dataset is balanced. Here they have considered the small class training set (For instance Dermatofibroma (df) skin lesion class) which is augmented around 50 times, which we have avoided in our proposed work. The other contributions include: • Resizing the medical dermoscopic lesion images to reduce memory consumption and improve latency.
• Data augmentation to overcome the limited number of images and reduce overfitting.
• Global feature descriptors allow the system to extract skin lesion features efficiently even with a little training dataset. • The proposed customized convolution neural network has hyperparameters to train the model and also, the softmax function is used at the end of the fully connected layer of the CNN. Hence the proposed CNN model works well with binary and multi-class detection. • To make the model more trustworthy, quicker, and error-free, different evaluation metrics, and 10-fold crossvalidation are applied. • Pipeline for developing a support tool for skin lesion medical practitioners.

Proposed methodology
We developed a fully automated approach for detecting and classifying skin lesions using Machine Learning and customized Convolutional Neural Networks. The proposed work concentrated on pre-processing and classification. The standard HAM10000 dataset is used in the proposed work which contains 10015 skin lesion images divided into seven categories. The steps involved in the proposed work is depicted in Fig. 2. Image pre-processing. The proposed work applied the following image pre-processing steps.
Step 1 -ordering the dataset. As the dataset images are out of order, sorting each image within each folder by the seven diseases is the necessary step. 'Image id' and ' dx' were the most crucial parameters for arranging the images in this scenario. In the dataset, we observe that df skin lesion count has a number 115 which is the smallest size. As a result, choosing 100 images per class and using a dataset of 100*7 images to train the model is insufficient to gain better classification accuracy. As a result, more data will be generated, and data augmentation will be used to achieve this task.
1. All the images in the folder are resized to 220*220 before processing into different machine learning models. 2. For the customized CNN model, images are scaled to 96 × 96 with a depth of 3 to speed up the process.
Then we have converted the images into a NumPy array to get the value of each pixel of the image. Then we normalized the pixel values to a range of 0-1. The LabelBinarizer class allows us to input class labels that are in string form in the dataset, convert those class labels into one-hot encoded vectors, and then convert them back into a human-readable form from the integer class label prediction of Keras CNN.
Step 3 -Data augmentation. www.nature.com/scientificreports/ 1. Data augmentation is a technique for generating new "data". To train the machine learning models, the proposed method used Horizontal Flip augmentation i.e., shifting all pixels of an image in a horizontal direction. As a result, models with data augmentation are more likely to learn more differentiating characteristic features than models without data augmentation. We have 200 images from each class after augmentation and trained the model with a dataset of 200*7 images. Figure 3 depicts the sample images after Horizontal Flip augmentation.    Table 1.

Convolutional neural network.
In contrast to a standard neural net, a CNN learns detailed patterns by applying filters to the raw pixels of an image. To build a CNN, Tensorflow and Keras libraries were used to build and implement the model in Python 3.7.9. A high-level overview of CNN Architecture is shown in Fig. 4. The layers and hyperparameters employed in the network are summarized in Table 2.

Model hyperparameters.
To get a better model evaluation, certain common hyperparameter values are used. The hyperparameter values utilized in the CNN model are highlighted in Table 3. The following section explains why the values of the hyperparameters were chosen in the proposed work: Optimizer: Adam is the most widely used optimization method for training deep neural networks today because it is simple to use, computationally efficient, and effective when dealing with enormous amounts of data and parameters. Loss Function: The Multi-Class calculates the loss value using the "categorical cross-entropy" loss function. Epochs: The epoch count is 150. Found that 150 epochs result in a model with low loss and no overfitting to the training set through experimentation (or not overfitted as best as we can). Batch Size: Several early tests with batch sizes of 5, 10, 20, and 40 found that batch size 32 produced the best results. Learning Rate: The rate of learning is initially set to  www.nature.com/scientificreports/ 0.001. The "step" we take along the gradient is controlled by the learning rate. The smaller the value, the smaller the step, and the larger the value, the bigger the step.

Results and discussion
All of the Machine Learning models and CNN were trained and tested on a Windows 10 computer with an Intel i5 processor and 8GB of RAM. The models were created with Spyder 5 and Python 3.7.9, with Keras, Imutils, and cv2Numpy as dependencies. The accuracy of the machine learning models with two methods involving/not involving augmentation is shown in Table 4. From the Table 4 results, we have observed that the Random Forest Machine algorithm provides better accuracy compared to other machine learning algorithms. The experimental results of CNN and Machine learning outcomes with the use of model accuracy and weighted average of precision, recall and F1-score is shown in Table 5.When using K-fold cross-validation, the accuracy measure is the mean of the accuracies of the K-models, not simply the accuracy of one model. Table 6 gives the customized CNN model's accuracy and loss for training and testing sets. As seen in this table, the proposed customized CNN model has a performance difference of 9% between the training and testing accuracy. After 150 epochs of training, the model achieved low loss with minimal overfitting. We could also improve our accuracy by adding more training data. Figure 5 depict the CNN model's training history versus the number of epochs. This is an important visualization to ensure that the model continued to train with each epoch, improving its accuracy and reducing losses while aiming to optimize the objective function. The figures are visible in the code. The testing sets accuracy, recall, precision, and f1-score associated with the model are shown in Table 7.
In Fig. 6 we compare proposed CNN and Machine learning outcomes with the use of model accuracy and weighted average of precision, recall, and f1-score. This is the graphical representation of Table 5. Table 8 gives   www.nature.com/scientificreports/ the comparison of the accuracy obtained in the proposed work with the recent existing work on the HAM10000 dataset. The first row in this table gives the accuracy of the proposed CNN model. We have also implemented Inception V3, a pre-trained model, and compared the accuracy with the proposed work. The InceptionV3 model resulted in an accuracy of 95.14%. Even though there is not much difference in    www.nature.com/scientificreports/ the accuracy, the proposed CNN model has computationally less overhead in terms of total training time as compared to the InceptionV3 model due to more layers in the network.

Conclusion and future work
The proposed work has applied machine learning and CNN techniques to classify the skin lesion images. The experiments were conducted on the HAM10000 dataset. The machine learning and the customized CNN techniques were evaluated after the experiments based on Accuracy, Precision, Recall, and F1-Score. Before the training/testing phase, the images were pre-processed, then separated into feature and target values, and formed data augmentation. The results show that the customized CNN has obtained an accuracy of 95.18%, which is better than the proposed machine learning algorithms. This suggests that the proposed CNN has a better classification performance for the HAM10000 data set. The proposed work has been compared with the recent existing work on the same data set and proved to obtain better accuracy with minimum loss and errors. As a future work, researchers can improve CNN architecture and implementation by fine-tuning hyper parameters such as the number of layers, type of layers, and hyper parameter values for the layers and can explore other pre-trained CNN models. Researchers can also focus on image segmentation and Skin lesion categorization in real-time with better accuracy and minimum time.