Prediction of skin disease using a new cytological taxonomy based on cytology and pathology with deep residual learning method

With the development of artificial intelligence, technique improvement of the classification of skin disease is addressed. However, few study concerned on the current classification system of International Classification of Diseases, Tenth Revision (ICD)-10 on Diseases of the skin and subcutaneous tissue, which is now globally used for classification of skin disease. This study was aimed to develop a new taxonomy of skin disease based on cytology and pathology, and test its predictive effect on skin disease compared to ICD-10. A new taxonomy (Taxonomy 2) containing 6 levels (Project 2–4) was developed based on skin cytology and pathology, and represents individual diseases arranged in a tree structure with three root nodes representing: (1) Keratinogenic diseases, (2) Melanogenic diseases, and (3) Diseases related to non-keratinocytes and non-melanocytes. The predictive effects of the new taxonomy including accuracy, precision, recall, F1, and Kappa were compared with those of ICD-10 on Diseases of the skin and subcutaneous tissue (Taxonomy 1, Project 1) by Deep Residual Learning method. For each project, 2/3 of the images were included as training group, and the rest 1/3 of the images acted as test group according to the category (class) as the stratification variable. Both train and test groups in the Projects (2 and 3) from Taxonomy 2 had higher F1 and Kappa scores without statistical significance on the prediction of skin disease than the corresponding groups in the Project 1 from Taxonomy 1, however both train and test groups in Project 4 had a statistically significantly higher F1-score than the corresponding groups in Project 1 (P = 0.025 and 0.005, respectively). The results showed that the new taxonomy developed based on cytology and pathology has an overall better performance on predictive effect of skin disease than the ICD-10 on Diseases of the skin and subcutaneous tissue. The level 5 (Project 4) of Taxonomy 2 is better on extension to unknown data of diagnosis system assisted by AI compared to current used classification system from ICD-10, and may have the potential application value in clinic of dermatology.

www.nature.com/scientificreports/ A lot of researches focused on the technique improvement of the diagnosis, especially on artificial intelligence 3 . Binder et al. 4 used computerized image analysis and an artificial neural network to automatically diagnose pigmented skin lesions. The sensitivity and specificity of the computerized system were 90% and 74%, respectively. Verma et al. 5 classified erythemato-squamous diseases by ensemble 5 different data mining techniques, and the results showed that the proposed ensemble method generates more efficient use of the dataset and give more accurate rate than individual data mining techniques.
Sharma et al. 6 compared Support Vector Machine and Artificial Neural Network, along with an ensemble of these two techniques for classification of erythemato-squamous diseases, and found that the ensemble model has achieved a remarkable performance with the highest accuracy.
Moradi and Mahdavi-Amiri 7 propose a kernel sparse representation based method for segmentation and classification of melanoma images, and the evaluation results demonstrate their approach to be competitive as compared to the available state-of-the-art methods.
Yap et al. 8 developed a multimodal classifier, which outperforms a baseline classifier that only uses a single macroscopic image in both binary melanoma detection and in multiclass classification.
Chang and Chen 9 used decision tree of data mining combining with neural network classification methods to construct the best predictive model on six major skin diseases, and found that the neural network model had the highest accuracy in prediction.
The main work of these investigations is listed in Table 1. However, all of the investigations focused on improvement of diagnosis effects with the assistance of the artificial intelligence techniques, few researches concentrating on the imperfection of the current classification system of dermatology and venereology have been developed. The International Classification of Diseases, Tenth Revision (ICD)-10 is now globally universal in order to keep consistency in disease diagnosis, however, the literature on the shortcomings of the ICD-10 is scant. Recent studies have found deficiencies in the classification of allergic conditions by ICD-10 codes 10,11 , and a new revision --ICD-11 --is currently being developed with the aim of solving problems 12 . www.nature.com/scientificreports/ With in-depth researches on pathogenesis of skin disease, the knowledge on dermatology is improved and multiple diseases have been approved that their initial classifications are not accurate, for example, pyogenic granuloma sounds like an infectious diseases but actually is a kind of hemangioma, classification and nomenclature of vascular malformations have also changed 13 , and sebopsoriasis lacks a specific code 14 . So, the modern dermatology faces an imperious demand of classification with being more scientific. Esteva et al. 15 developed a dermatologist-level system for skin cancer classification, although the aim of this study was to test an artificial intelligence capable of classifying skin cancer, it provides a direction to re-classify skin disease from different aspects.
Based on the above considerations, we conduct this study to develop a new taxonomy based on the cytology and pathology, and to further test the new taxonomy on diagnosis effects by Deep Residual Learning method, and compared with the ICD-10 on Diseases of the skin and subcutaneous tissue, in order to find a new classification benefiting prediction, having potential application in clinical practices in dermatology and venereology. Figure 1 demonstrates the whole structure of methodology used in this research, and the approach used in this paper is completely data driven.
Taxonomy 2. The taxonomy 2 represents 1,000 individual diseases arranged in a tree structure with three root nodes representing: (1) Keratinogenic diseases (KCs), (2) Melanogenic diseases (MCs), and (3) Diseases related to non-keratinocytes and non-melanocytes (Non-KC and non-MC). The taxonomy 2 was derived by dermatologists using a bottom-up procedure. Among the tree structure, individual diseases, initialized as leaf nodes, were merged based on organic or cellular similarity, until the entire structure was connected. The taxonomy 2 contains 6 levels, and the level 1-3 are present in Fig. 2. For each type of disease, a number indicates a different disease, and so on up to level 6.
The taxonomy is used in generating training classes that are both well-suited for machine learning classifiers and medically relevant. The root nodes are used in the first validation strategy and represent the source cell/ Level 3 from Taxonomy 2 is defined as Project 2, and contains a total of 2 classes: Inflammatory diseases; Infectious diseases. Level 4 from Taxonomy 2 is defined as Project 3, and contains a total of 4 classes: Virus, Parasite, Bacteria, Dermatitis. Level 5 from Taxonomy 2 is defined as Project 4, and contains a total of 11 categories: porokeratosis; herpes, simple genital; lichen planus; condilomas acuminados; ichthyosis; viral exanthems; pediculosis pubis; pemphigus; gonorrhea; eczema; sarna noruega.

Data processing instructions.
According to the Taxonomy 2, finally 1,847 images were extracted. And then, the images are screened to ensure that the two taxonomies contain the same ones, and finally a total of 1,160 images were obtained.
Predictive model evaluation by recurrent neural network. After annotation of the images, our predictions on the two taxonomies are based on Deep Residual Learning for Image Recognition (deep learning), which belongs to CNN. For fair comparison, we adopt ResNet-50 pre-trained on ImageNet as the feature extraction network. Specifically, SGD optimizer with momentum 0.9 and weight decay 5e-4 is adopted, the initial learning rate is set as 1e-4. The batch size is set to 64 and the drop-out rate is 0.5.
Identify the images according to the Taxonomy 1: Project_1 represents the specific information of each picture marked using taxonomy1 classification system. Entity_id is the unique ID of the picture. Code_1 represents the number of images in each category under images marked with the taxonomy1 classification system. code_id is the category unique ID.
Identify the images according to the Taxonomy 2 (3-5 levels): Project 2, Project 3, Project 4 represents the specific information of each picture marked at the 3, 4, 5 level using the Taxonomy 2 system, respectively. entity_id is the unique ID of the picture. And code_2 represents the Taxonomy 2 system. At the 2, 3, 4, level under the marked images, respectively, the number of images in each category. code_id is the category unique ID.
For each project, 2/3 of the images were included as the training group, and the rest 1/3 of the images acted as the test group according to the category (class) as the stratification variable.
The accuracy, Kappa coefficient, Precision, Recall, and F1-score were calculated and compared between the two taxonomies.

Results
The overall comparison on predicted results between projects. Table 2 showed the comparison of the predicted results of projects by different categories. Only the Project 4 has a higher accuracy on prediction of skin disease. Except for the test group in Project 3, all of the train and test groups in the Projects (2,3, and 4) from Taxonomy 2 have a higher precision on prediction of skin disease than the corresponding group in the Project 1 from Taxonomy 1, while no differences are significant. For the recall rate of Projects, both train and test groups in the Projects (2,3, and 4) from Taxonomy 2 are better than the corresponding group Project 1 from Taxonomy 1, while only the test group in Project 4 has a statistically significantly higher recall rate than the test group in Project 1 (P = 0.016).
For the F1-score, both train and test groups in the Projects (2, 3, and 4) from Taxonomy 2 are better than the corresponding groups in Project 1 from Taxonomy 1, and both the train and test groups in Project 4 have a statistically significantly higher F1-score than the corresponding groups in Project 1 (P = 0.025 and 0.005, respectively).
All of the train and test groups in the Projects (2, 3, and 4) from Taxonomy 2 have a higher Kappa value on prediction of skin disease than the corresponding groups in the Project 1 from Taxonomy 1.

Comparisons among classes in Projects.
The results showed that all of the parameters including sensitivity and recall, specificity, positive predictive value (PPV) and precision, negative predictive value (NPV), and F1 in the 11 diseases of the train groups are all better than those in the test group in Project 1 (Table 3). And the F1 in part of diseases, especially of gonococcal infection and Herpes viral infections, in the test group are much lower compared with that in the train group.
While the results showed that all of the parameters including sensitivity and recall, specificity, PPV and precision, NPV, and F1 in the 11 diseases of the train groups are similar with those in the test group at different classification levels in Projects 2-4 of Taxonomy 2 (Project 2/Level 3, Table 4; Project 3/Level 4, Table 5; Project 4/Level 5, Table 6).

Discussion
Descriptive dermatology of the morphological phenomena of skin has been developed for more than two thousand years 16 . Briefly, our ancestors have separated skin disorders, depending either on their location, their appearance or more interestingly their suspected cause. In consequence, the textbooks, that have fashioned our education, have also adopted sometimes very different ways to present and classify skin diseases 17 . Classification by similarities became more and more difficult as the complexity of disease was realized 18 . New classification which may help diagnosis, disease management, and discipline development is in urgent need.  This study developed a new taxonomy (Taxonomy 2) containing 6 levels (project 2-4) of most skin disease based on cytology and pathology, which is a completely new work on the dermatology and venereology compared to the previous work focusing on classification of one type or several skin disease by AI techniques [4][5][6][7][8][9] .
In order to investigate the predictive effect of the new taxonomy on skin disease, we further compared the accuracy, precision, recall, F1, and Kappa of the new taxonomy with the ICD 10 using Deep Residual Learning method. Precision, recall, and F1-score are commonly used to evaluate the predictive effect of models/projects in multi-class prediction. Precision is the number of correctly predicted samples divided by the number of all www.nature.com/scientificreports/ samples, that is, the prediction accuracy rate of the model, and is used to measure the proportion of correct discrimination among all predicted categories, similar to sensitivity. Recall is used to measure the proportion of correctly identified in all true categories, similar to specificity. The two constitute a pair of contradictory measures. F1 score is used to weigh these two indicators. Deep CNNs has a potential widely application for diagnosis of skin diseases, with a higher accuracy compared with human dermatologists 19,20 , that is why we applied it to prediction diseases based on different taxonomies, at same time to avoid instability of human beings.
Our results confirmed that the new taxonomy had a better performance in all parameters, and the final level of classification had a significant higher F1-score than the ICD-10 taxonomy, which means it may be better on extension to unknown data and may provide a better taxonomy system for skin disease prediction under assistance of AI techniques in the future.
The literature on the shortcomings of the ICD-10 is very few. A compatible version of the ICD-10 specifically adapted to dermatology was produced in Spain in 1999 to overcome these shortcomings. González-López et al. 21 confirmed that the ICD-10 system does have some minor shortcomings when it comes to coding certain diseases, particularly newly discovered and emerging diseases. A classification of hypersensitivity/allergic diseases was constructed to validate it for ICD-11 by crowdsourcing the allergist community 11 , because the well-known misclassification and/or under-notification of these diseases in the ICD, which has a direct and huge detrimental impact on hypersensitivity/allergic diseases data 22 . However, a reclassification of whole disciplinary systems of dermatology hasn't been tried yet, so we attempted to construct a new taxonomy in this study. The results of current study confirmed that the taxonomy 2 developed has advance on the disease prediction compared to ICD-10 on skin diseases, which may have a potential application value in future clinical practice in dermatology and venereology.
The current study has the following limitations: 1. AI is the only detection technology for comparison, but is not the gold standard for prediction, so it has system error, which may affect the comparison result. 2. The dermatological data didn't include histopathological images, and it may influence accurate classification effect. 3. The train and test groups of Project 1 have differences on all of the three parameters. And the Project 3 and Project 4 have a difference on precision and F1-score, respectively. Our purpose of dividing the images into 2 groups is to prevent model overfitting, which means that it performs well in the training group, but may be very poor when it is changed to other data and cannot be well predicted. We used 2/3 of the data to build the model and adjust the parameters in order to build a good model, however the difference between train and test groups indicate a low credibility of the results, the images of different types of diseases are not balanced, which may result from the not good enough quality of images of skin diseases, especially for some types.

Conclusion and future work
In conclusion, this study is a try for dermatology precise or effective classification for discipline development, and this new taxonomy based on cytology and pathology we developed is an innovation and challenge for current dermatology classification from ICD-10, and has been provided to have an overall better performance on predictive effect including sensitivity and recall, specificity, PPV and precision, NPV, and F1, compared with ICD-10.
The new taxonomy has the potential application value for clinical practice using AI techniques for skin prediction. However, a coming comprehensive system covering more skin disease and having different data including dermoscopic and histopathogical images are necessary for further confirmation of the stability of the taxonomy.

Data availability
The data that support the findings of this study are available from the first author (Jin Bu, dr.jinbu@gmail.com) upon reasonable request.