Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A deep learning system for differential diagnosis of skin diseases

Abstract

Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the development and validation of our DLS.
Fig. 2: Performance of the DLS and the dermatologists (Derm), primary care physicians (PCPs) and nurse practitioners (NPs).
Fig. 3: Representative examples of challenging cases missed by non-dermatologists.
Fig. 4: Importance of different inputs to the DLS.

Data availability

The de-identified teledermatology data used in this study are not publicly available due to restrictions in the data-sharing agreement.

Code availability

The deep learning framework (TensorFlow) used in this study is available at https://www.tensorflow.org/. The training framework (Estimator) is available at https://www.tensorflow.org/guide/estimators. The deep learning architecture (Inception-v4) is available at https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v4.py.

References

  1. Hay, R. J. et al. The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions. J. Invest. Dermatol. 134, 1527–1534 (2014).

    Article  CAS  Google Scholar 

  2. Lowell, B. A., Froelich, C. W., Federman, D. G. & Kirsner, R. S. Dermatology in primary care: prevalence and patient disposition. J. Am. Acad. Dermatol. 45, 250–255 (2001).

    Article  CAS  Google Scholar 

  3. Awadalla, F., Rosenbaum, D. A., Camacho, F., Fleischer, A. B. Jr & Feldman, S. R. Dermatologic disease in family medicine. Fam. Med. 40, 507–511 (2008).

    PubMed  Google Scholar 

  4. Feng, H., Berk-Krauss, J., Feng, P. W. & Stein, J. A. Comparison of dermatologist density between urban and rural counties in the United States. JAMA Dermatol. 154, 1265–1271 (2018).

    Article  Google Scholar 

  5. Resneck, J. & Kimball, A. B. The dermatology workforce shortage. J. Am. Acad. Dermatol. 50, 50–54 (2004).

    Article  Google Scholar 

  6. Johnson, M. L. On teaching dermatology to nondermatologists. Arch. Dermatol. 130, 850–852 (1994).

    Article  CAS  Google Scholar 

  7. Ramsay, D. L. & Weary, P. E. Primary care in dermatology: whose role should it be? J. Am. Acad. Dermatol. 35, 1005–1008 (1996).

    Article  CAS  Google Scholar 

  8. The Distribution of the US Primary Care Workforce (Agency for Healthcare Research & Quality, 2012); https://www.ahrq.gov/research/findings/factsheets/primary/pcwork3/index.html

  9. Seth, D., Cheldize, K., Brown, D. & Freeman, E. F. Global burden of skin disease: inequities and innovations. Curr. Dermatol. Rep. 6, 204–210 (2017).

    Article  Google Scholar 

  10. Federman, D. G., Concato, J. & Kirsner, R. S. Comparison of dermatologic diagnoses by primary care practitioners and dermatologists. A review of the literature. Arch. Fam. Med. 8, 170–172 (1999).

    Article  CAS  Google Scholar 

  11. Moreno, G., Tran, H., Chia, A. L. K., Lim, A. & Shumack, S. Prospective study to assess general practitioners’ dermatological diagnostic skills in a referral setting. Australas. J. Dermatol. 48, 77–82 (2007).

    Article  Google Scholar 

  12. Tran, H., Chen, K., Lim, A. C., Jabbour, J. & Shumack, S. Assessing diagnostic skill in dermatology: a comparison between general practitioners and dermatologists. Australas. J. Dermatol. 46, 230–234 (2005).

    Article  Google Scholar 

  13. Federman, D. G. & Kirsner, R. S. The abilities of primary care physicians in dermatology: implications for quality of care. Am. J. Manag. Care 3, 1487–1492 (1997).

    CAS  PubMed  Google Scholar 

  14. UpToDate https://www.uptodate.com/home

  15. Cutrone, M. & Grimalt, R. Dermatological image search engines on the Internet: do they work? J. Eur. Acad. Dermatol. Venereol. 21, 175–177 (2007).

    Article  CAS  Google Scholar 

  16. Yim, K. M., Florek, A. G., Oh, D. H., McKoy, K. & Armstrong, A. W. Teledermatology in the United States: an update in a dynamic era. Telemed. e-Health 24, 691–697 (2018).

    Article  Google Scholar 

  17. Whited, J. D. et al. Clinical course outcomes for store and forward teledermatology versus conventional consultation: a randomized trial. J. Telemed. Telecare 19, 197–204 (2013).

    Article  Google Scholar 

  18. Mounessa, J. S. et al. A systematic review of satisfaction with teledermatology. J. Telemed. Telecare 24, 263–270 (2018).

    Article  Google Scholar 

  19. Cruz-Roa, A. A., Arevalo Ovalle, J. E., Madabhushi, A. & González Osorio, F. A. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med. Image Comput. Comput. Assist. Inter. 16, 403–410 (2013).

    Google Scholar 

  20. Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (IEEE, 2018); https://doi.org/10.1109/isbi.2018.8363547

  21. Yuan, Y., Chao, M. & Lo, Y.-C. Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance. IEEE Trans. Med. Imaging 36, 1876–1886 (2017).

    Article  Google Scholar 

  22. Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836–1842 (2018).

    Article  CAS  Google Scholar 

  23. Brinker, T. J. et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 113, 47–54 (2019).

    Article  Google Scholar 

  24. Maron, R. C. et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur. J. Cancer 119, 57–65 (2019).

    Article  Google Scholar 

  25. Okuboyejo, D. A., Olugbara, O. O. & Odunaike, S. A. Automating skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850–854 (International Association of Engineers, 2013).

  26. Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).

    Article  Google Scholar 

  27. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  Google Scholar 

  28. Han, S. S. et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS ONE 13, e0191493 (2018).

    Article  Google Scholar 

  29. Sun, X., Yang, J., Sun, M. & Wang, K. A benchmark for automatic visual classification of clinical skin disease images. Proceedings of the European Conference on Computer Vision (ECCV) 2016 206–222 (Springer, 2016); https://doi.org/10.1007/978-3-319-46466-4_13

  30. Boer, A. & Nischal, K.C. www.derm101.com: a growing online resource for learning dermatology and dermatopathology. Indian J. Dermatol. Venereol. Leprol. 73, 138–140 (2007).

    Article  Google Scholar 

  31. Wilmer, E. N. et al. Most common dermatologic conditions encountered by dermatologists and nondermatologists. Cutis 94, 285–292 (2014).

    PubMed  Google Scholar 

  32. Yang, J., Sun, X., Liang, J. & Rosin, P. L. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018); https://doi.org/10.1109/cvpr.2018.00137

  33. Okuboyejo, D. A. Towards automation of skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850–854 (International Association of Engineers, 2013).

  34. Mishra, S., Imaizumi, H. & Yamasaki, T. Interpreting fine-grained dermatological classification by deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2019).

  35. Guyatt, G. Users’ Guides to the Medical Literature: Essentials of Evidence-Based Clinical Practice 3rd edn (McGraw-Hill Education/Medical, 2015).

  36. Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br. J. Surg. 102, 148–158 (2015).

    Article  CAS  Google Scholar 

  37. Webber, W., Moffat, A. & Zobel, J. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 1–38 (2010).

    Article  Google Scholar 

  38. Krauss, J. C., Boonstra, P. S., Vantsevich, A. V. & Friedman, C. P. Is the problem list in the eye of the beholder? An exploration of consistency across physicians. J. Am. Med. Inform. Assoc. 23, 859–865 (2016).

    Article  Google Scholar 

  39. Eng, C., Liu, Y. & Bhatnagar, R. Measuring clinician–machine agreement in differential diagnoses for dermatology. Br. J. Dermatol. https://doi.org/10.1111/bjd.18609 (2019).

  40. Sundararajan, M., Taly, A., & Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning vol. 70, 3319–3328 (2017).

  41. Karimkhani, C. et al. Global skin disease morbidity and mortality: an update from the global burden of disease study 2013. JAMA Dermatol. 153, 406–412 (2017).

    Article  Google Scholar 

  42. Stern, R. S. & Nelson, C. The diminishing role of the dermatologist in the office-based care of cutaneous diseases. J. Am. Acad. Dermatol. 29, 773–777 (1993).

    Article  CAS  Google Scholar 

  43. Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results (Institute for Health Metrics and Evaluation (IHME), 2018); http://ghdx.healthdata.org/gbd-results-tool

  44. Romano, C., Maritati, E. & Gianni, C. Tinea incognito in Italy: a 15-year survey. Mycoses 49, 383–387 (2006).

    Article  CAS  Google Scholar 

  45. Prabhu, V. et al. Prototypical clustering networks for dermatological disease diagnosis. In Proceedings of the 4th Conference on Machine Learning for Health Care (MLHC, 2019).

  46. He, S. Y. et al. Self-reported pigmentary phenotypes and race are significant but incomplete predictors of Fitzpatrick skin phototype in an ethnically diverse population. J. Am. Acad. Dermatol. 71, 731–737 (2014).

    Article  Google Scholar 

  47. Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2, e190096 (2019).

    Article  Google Scholar 

  48. SNOMED home page. SNOMED http://www.snomed.org/

  49. Simpson, C. R., Anandan, C., Fischbacher, C., Lefevre, K. & Sheikh, A. Will systematized nomenclature of medicine-clinical terms improve our understanding of the disease burden posed by allergic disorders? Clin. Exp. Allergy 37, 1586–1593 (2007).

    Article  CAS  Google Scholar 

  50. Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence 4278–4284 (AAAI, 2017).

  51. Snoek, C. G. M., Worring, M. & Smeulders, A. W. M. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia 399–402 (ACM, 2005); https://doi.org/10.1145/1101149.1101236

  52. Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 1223–1231 (NIPS, 2012).

  53. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/pdf/1502.03167.pdf (2015).

  54. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

    Article  Google Scholar 

  55. Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999).

    Article  Google Scholar 

  56. Permutation feature importance. Azure Machine Learning Studio https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance.

  57. Chihara, L. M. & Hesterberg, T. C. Mathematical Statistics with Resampling and R (Wiley, 2018).

  58. Hahn, S. Understanding noninferiority trials. Korean J. Pediatr. 55, 403–407 (2012).

    Article  Google Scholar 

Download references

Acknowledgements

We thank W. Chen, J. Yoshimi, X. Ji and Q. Duong for software infrastructure support for data collection. Thanks also go to G. Foti, K. Su, T. Saensuksopa, D. Wang, Y. Gao and L. Tran. We also appreciate the input of C. Chen, M. Howell and A. Paller for their feedback on the manuscript. Last, but not least, this work would not have been possible without the participation of the dermatologists, primary care physicians and nurse practitioners who reviewed cases for this study, and S. Bis who helped to establish the skin condition mapping.

Author information

Authors and Affiliations

Authors

Contributions

Yuan Liu, A.J., C.E., D.H.W., K.L. and D.C. prepared the dataset for usage. S.J.H., K.K. and R.H.-W. provided clinical expertise and guidance for the study. Yuan Liu, A.J., C.E., K.L., P.B., G.d.O.M., J.G., D.A., S.J.H. and K.K. worked on the technical, logistical and quality control aspects of label collection. S.J.H. and K.K. established the skin condition mapping. Yuan Liu, K.L., V.G. and D.C. developed the model. Yuan Liu, A.J., N.S. and V.N. performed statistical analysis and additional analysis. Yun Liu guided study design, analysis of the results and statistical analysis. S.G. studied the potential utility of the model. R.C.D. and D.C. initiated the project and led the overall development, with strategic guidance and executive support from G.S.C., L.H.P. and D.R.W. Yuan Liu, Yun Liu and S.J.H. prepared the manuscript with the assistance and feedback from all other co-authors. K.K. and S.J.H. performed the work at Google Health via Advanced Clinical. G.d.O.M. performed the work at Google Health via Adecco Staffing. N.S. performed the work at Google Health.

Corresponding author

Correspondence to Yun Liu.

Ethics declarations

Competing interests

K.K. and S.J.H. were consultants of Google LLC. R.H.-W. is an employee of the Medical University of Graz. G.d.O.M. is an employee of Adecco Staffing supporting Google LLC. This study was funded by Google LLC. The remaining authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. Yuan Liu, A.J., C.E., D.H.W., K.L., P.B., J.G., V.G., D.A., Yun Liu, R.C.D. and D.C. are inventors on a filed patent related to this work. The authors declare no other competing interests.

Additional information

Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of the deep learning system (DLS) and clinicians, broken down for each of the 26 categories of skin conditions and ‘other’.

a, Top-1 and top-3 sensitivity of the DLS on validation set A (n=3,756). b, Top-1 and top-3 sensitivity of the DLS and three types of clinicians: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set B (n=963). Numbers in parentheses in the x-axes indicate the number of cases. Detailed breakdown of each clinician and the DLS performance on the subset of cases graded by each clinician are in Supplementary Table 8. Error bars indicate 95% CI (see Statistical Analysis).

Extended Data Fig. 2 Performance of the deep learning system (DLS) and the clinicians on the 419-way classification: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set A (n=3,756) and validation set B (n=963).

a, Top-1 and top-3 accuracy for the DLS and clinicians across all cases and 419 categories of skin conditions. b, Average overlap (to assess the full differential diagnosis) of the DLS and clinicians. Error bars indicate 95% confidence intervals (see Statistical Analysis).

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–10 and Tables 1–13.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Jain, A., Eng, C. et al. A deep learning system for differential diagnosis of skin diseases. Nat Med 26, 900–908 (2020). https://doi.org/10.1038/s41591-020-0842-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-020-0842-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing