A deep learning system for differential diagnosis of skin diseases

Liu, Yuan; Jain, Ayush; Eng, Clara; Way, David H.; Lee, Kang; Bui, Peggy; Kanada, Kimberly; de Oliveira Marinho, Guilherme; Gallegos, Jessica; Gabriele, Sara; Gupta, Vishakha; Singh, Nalini; Natarajan, Vivek; Hofmann-Wellenhof, Rainer; Corrado, Greg S.; Peng, Lily H.; Webster, Dale R.; Ai, Dennis; Huang, Susan J.; Liu, Yun; Dunn, R. Carter; Coz, David

doi:10.1038/s41591-020-0842-3

Article
Published: 18 May 2020

A deep learning system for differential diagnosis of skin diseases

Yuan Liu ORCID: orcid.org/0000-0003-4079-8275¹,
Ayush Jain¹,
Clara Eng¹,
David H. Way¹,
Kang Lee¹,
Peggy Bui^1,2,
Kimberly Kanada³,
Guilherme de Oliveira Marinho⁴,
Jessica Gallegos¹,
Sara Gabriele¹,
Vishakha Gupta¹,
Nalini Singh^1,5,
Vivek Natarajan¹,
Rainer Hofmann-Wellenhof⁶,
Greg S. Corrado¹,
Lily H. Peng¹,
Dale R. Webster¹,
Dennis Ai¹,
Susan J. Huang³,
Yun Liu¹,
R. Carter Dunn¹^na1 &
…
David Coz¹^na1

Nature Medicine volume 26, pages 900–908 (2020)Cite this article

22k Accesses
337 Citations
514 Altmetric
Metrics details

Subjects

Abstract

Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the development and validation of our DLS.**

**Fig. 2: Performance of the DLS and the dermatologists (Derm), primary care physicians (PCPs) and nurse practitioners (NPs).**

**Fig. 3: Representative examples of challenging cases missed by non-dermatologists.**

**Fig. 4: Importance of different inputs to the DLS.**

Deep learning-aided decision support for diagnosis of skin disease across skin tones

Article Open access 05 February 2024

Systematic review of deep learning image analyses for the diagnosis and monitoring of skin disease

Article Open access 27 September 2023

Tailored for Real-World: A Whole Slide Image Classification System Validated on Uncurated Multi-Site Data Emulating the Prospective Pathology Workload

Article Open access 21 February 2020

Data availability

The de-identified teledermatology data used in this study are not publicly available due to restrictions in the data-sharing agreement.

Code availability

The deep learning framework (TensorFlow) used in this study is available at https://www.tensorflow.org/. The training framework (Estimator) is available at https://www.tensorflow.org/guide/estimators. The deep learning architecture (Inception-v4) is available at https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v4.py.

References

Hay, R. J. et al. The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions. J. Invest. Dermatol. 134, 1527–1534 (2014).
Article CAS Google Scholar
Lowell, B. A., Froelich, C. W., Federman, D. G. & Kirsner, R. S. Dermatology in primary care: prevalence and patient disposition. J. Am. Acad. Dermatol. 45, 250–255 (2001).
Article CAS Google Scholar
Awadalla, F., Rosenbaum, D. A., Camacho, F., Fleischer, A. B. Jr & Feldman, S. R. Dermatologic disease in family medicine. Fam. Med. 40, 507–511 (2008).
PubMed Google Scholar
Feng, H., Berk-Krauss, J., Feng, P. W. & Stein, J. A. Comparison of dermatologist density between urban and rural counties in the United States. JAMA Dermatol. 154, 1265–1271 (2018).
Article Google Scholar
Resneck, J. & Kimball, A. B. The dermatology workforce shortage. J. Am. Acad. Dermatol. 50, 50–54 (2004).
Article Google Scholar
Johnson, M. L. On teaching dermatology to nondermatologists. Arch. Dermatol. 130, 850–852 (1994).
Article CAS Google Scholar
Ramsay, D. L. & Weary, P. E. Primary care in dermatology: whose role should it be? J. Am. Acad. Dermatol. 35, 1005–1008 (1996).
Article CAS Google Scholar
The Distribution of the US Primary Care Workforce (Agency for Healthcare Research & Quality, 2012); https://www.ahrq.gov/research/findings/factsheets/primary/pcwork3/index.html
Seth, D., Cheldize, K., Brown, D. & Freeman, E. F. Global burden of skin disease: inequities and innovations. Curr. Dermatol. Rep. 6, 204–210 (2017).
Article Google Scholar
Federman, D. G., Concato, J. & Kirsner, R. S. Comparison of dermatologic diagnoses by primary care practitioners and dermatologists. A review of the literature. Arch. Fam. Med. 8, 170–172 (1999).
Article CAS Google Scholar
Moreno, G., Tran, H., Chia, A. L. K., Lim, A. & Shumack, S. Prospective study to assess general practitioners’ dermatological diagnostic skills in a referral setting. Australas. J. Dermatol. 48, 77–82 (2007).
Article Google Scholar
Tran, H., Chen, K., Lim, A. C., Jabbour, J. & Shumack, S. Assessing diagnostic skill in dermatology: a comparison between general practitioners and dermatologists. Australas. J. Dermatol. 46, 230–234 (2005).
Article Google Scholar
Federman, D. G. & Kirsner, R. S. The abilities of primary care physicians in dermatology: implications for quality of care. Am. J. Manag. Care 3, 1487–1492 (1997).
CAS PubMed Google Scholar
UpToDate https://www.uptodate.com/home
Cutrone, M. & Grimalt, R. Dermatological image search engines on the Internet: do they work? J. Eur. Acad. Dermatol. Venereol. 21, 175–177 (2007).
Article CAS Google Scholar
Yim, K. M., Florek, A. G., Oh, D. H., McKoy, K. & Armstrong, A. W. Teledermatology in the United States: an update in a dynamic era. Telemed. e-Health 24, 691–697 (2018).
Article Google Scholar
Whited, J. D. et al. Clinical course outcomes for store and forward teledermatology versus conventional consultation: a randomized trial. J. Telemed. Telecare 19, 197–204 (2013).
Article Google Scholar
Mounessa, J. S. et al. A systematic review of satisfaction with teledermatology. J. Telemed. Telecare 24, 263–270 (2018).
Article Google Scholar
Cruz-Roa, A. A., Arevalo Ovalle, J. E., Madabhushi, A. & González Osorio, F. A. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med. Image Comput. Comput. Assist. Inter. 16, 403–410 (2013).
Google Scholar
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (IEEE, 2018); https://doi.org/10.1109/isbi.2018.8363547
Yuan, Y., Chao, M. & Lo, Y.-C. Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance. IEEE Trans. Med. Imaging 36, 1876–1886 (2017).
Article Google Scholar
Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836–1842 (2018).
Article CAS Google Scholar
Brinker, T. J. et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 113, 47–54 (2019).
Article Google Scholar
Maron, R. C. et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur. J. Cancer 119, 57–65 (2019).
Article Google Scholar
Okuboyejo, D. A., Olugbara, O. O. & Odunaike, S. A. Automating skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850–854 (International Association of Engineers, 2013).
Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).
Article Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS Google Scholar
Han, S. S. et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS ONE 13, e0191493 (2018).
Article Google Scholar
Sun, X., Yang, J., Sun, M. & Wang, K. A benchmark for automatic visual classification of clinical skin disease images. Proceedings of the European Conference on Computer Vision (ECCV) 2016 206–222 (Springer, 2016); https://doi.org/10.1007/978-3-319-46466-4_13
Boer, A. & Nischal, K.C. www.derm101.com: a growing online resource for learning dermatology and dermatopathology. Indian J. Dermatol. Venereol. Leprol. 73, 138–140 (2007).
Article Google Scholar
Wilmer, E. N. et al. Most common dermatologic conditions encountered by dermatologists and nondermatologists. Cutis 94, 285–292 (2014).
PubMed Google Scholar
Yang, J., Sun, X., Liang, J. & Rosin, P. L. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018); https://doi.org/10.1109/cvpr.2018.00137
Okuboyejo, D. A. Towards automation of skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850–854 (International Association of Engineers, 2013).
Mishra, S., Imaizumi, H. & Yamasaki, T. Interpreting fine-grained dermatological classification by deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2019).
Guyatt, G. Users’ Guides to the Medical Literature: Essentials of Evidence-Based Clinical Practice 3rd edn (McGraw-Hill Education/Medical, 2015).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br. J. Surg. 102, 148–158 (2015).
Article CAS Google Scholar
Webber, W., Moffat, A. & Zobel, J. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 1–38 (2010).
Article Google Scholar
Krauss, J. C., Boonstra, P. S., Vantsevich, A. V. & Friedman, C. P. Is the problem list in the eye of the beholder? An exploration of consistency across physicians. J. Am. Med. Inform. Assoc. 23, 859–865 (2016).
Article Google Scholar
Eng, C., Liu, Y. & Bhatnagar, R. Measuring clinician–machine agreement in differential diagnoses for dermatology. Br. J. Dermatol. https://doi.org/10.1111/bjd.18609 (2019).
Sundararajan, M., Taly, A., & Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning vol. 70, 3319–3328 (2017).
Karimkhani, C. et al. Global skin disease morbidity and mortality: an update from the global burden of disease study 2013. JAMA Dermatol. 153, 406–412 (2017).
Article Google Scholar
Stern, R. S. & Nelson, C. The diminishing role of the dermatologist in the office-based care of cutaneous diseases. J. Am. Acad. Dermatol. 29, 773–777 (1993).
Article CAS Google Scholar
Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results (Institute for Health Metrics and Evaluation (IHME), 2018); http://ghdx.healthdata.org/gbd-results-tool
Romano, C., Maritati, E. & Gianni, C. Tinea incognito in Italy: a 15-year survey. Mycoses 49, 383–387 (2006).
Article CAS Google Scholar
Prabhu, V. et al. Prototypical clustering networks for dermatological disease diagnosis. In Proceedings of the 4th Conference on Machine Learning for Health Care (MLHC, 2019).
He, S. Y. et al. Self-reported pigmentary phenotypes and race are significant but incomplete predictors of Fitzpatrick skin phototype in an ethnically diverse population. J. Am. Acad. Dermatol. 71, 731–737 (2014).
Article Google Scholar
Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2, e190096 (2019).
Article Google Scholar
SNOMED home page. SNOMED http://www.snomed.org/
Simpson, C. R., Anandan, C., Fischbacher, C., Lefevre, K. & Sheikh, A. Will systematized nomenclature of medicine-clinical terms improve our understanding of the disease burden posed by allergic disorders? Clin. Exp. Allergy 37, 1586–1593 (2007).
Article CAS Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence 4278–4284 (AAAI, 2017).
Snoek, C. G. M., Worring, M. & Smeulders, A. W. M. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia 399–402 (ACM, 2005); https://doi.org/10.1145/1101149.1101236
Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 1223–1231 (NIPS, 2012).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/pdf/1502.03167.pdf (2015).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Article Google Scholar
Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999).
Article Google Scholar
Permutation feature importance. Azure Machine Learning Studio https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance.
Chihara, L. M. & Hesterberg, T. C. Mathematical Statistics with Resampling and R (Wiley, 2018).
Hahn, S. Understanding noninferiority trials. Korean J. Pediatr. 55, 403–407 (2012).
Article Google Scholar

Download references

Acknowledgements

We thank W. Chen, J. Yoshimi, X. Ji and Q. Duong for software infrastructure support for data collection. Thanks also go to G. Foti, K. Su, T. Saensuksopa, D. Wang, Y. Gao and L. Tran. We also appreciate the input of C. Chen, M. Howell and A. Paller for their feedback on the manuscript. Last, but not least, this work would not have been possible without the participation of the dermatologists, primary care physicians and nurse practitioners who reviewed cases for this study, and S. Bis who helped to establish the skin condition mapping.

Author information

These authors contributed equally: R. Carter Dunn, David Coz.

Authors and Affiliations

Google Health, Palo Alto, CA, USA
Yuan Liu, Ayush Jain, Clara Eng, David H. Way, Kang Lee, Peggy Bui, Jessica Gallegos, Sara Gabriele, Vishakha Gupta, Nalini Singh, Vivek Natarajan, Greg S. Corrado, Lily H. Peng, Dale R. Webster, Dennis Ai, Yun Liu, R. Carter Dunn & David Coz
University of California, San Francisco, San Francisco, CA, USA
Peggy Bui
Advanced Clinical, Deerfield, IL, USA
Kimberly Kanada & Susan J. Huang
Adecco Staffing, Santa Clara, CA, USA
Guilherme de Oliveira Marinho
Massachusetts Institute of Technology, Cambridge, MA, USA
Nalini Singh
Medical University of Graz, Graz, Austria
Rainer Hofmann-Wellenhof

Authors

Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ayush Jain
View author publications
You can also search for this author in PubMed Google Scholar
Clara Eng
View author publications
You can also search for this author in PubMed Google Scholar
David H. Way
View author publications
You can also search for this author in PubMed Google Scholar
Kang Lee
View author publications
You can also search for this author in PubMed Google Scholar
Peggy Bui
View author publications
You can also search for this author in PubMed Google Scholar
Kimberly Kanada
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme de Oliveira Marinho
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Gallegos
View author publications
You can also search for this author in PubMed Google Scholar
Sara Gabriele
View author publications
You can also search for this author in PubMed Google Scholar
Vishakha Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Nalini Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Hofmann-Wellenhof
View author publications
You can also search for this author in PubMed Google Scholar
Greg S. Corrado
View author publications
You can also search for this author in PubMed Google Scholar
Lily H. Peng
View author publications
You can also search for this author in PubMed Google Scholar
Dale R. Webster
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Ai
View author publications
You can also search for this author in PubMed Google Scholar
Susan J. Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yun Liu
View author publications
You can also search for this author in PubMed Google Scholar
R. Carter Dunn
View author publications
You can also search for this author in PubMed Google Scholar
David Coz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuan Liu, A.J., C.E., D.H.W., K.L. and D.C. prepared the dataset for usage. S.J.H., K.K. and R.H.-W. provided clinical expertise and guidance for the study. Yuan Liu, A.J., C.E., K.L., P.B., G.d.O.M., J.G., D.A., S.J.H. and K.K. worked on the technical, logistical and quality control aspects of label collection. S.J.H. and K.K. established the skin condition mapping. Yuan Liu, K.L., V.G. and D.C. developed the model. Yuan Liu, A.J., N.S. and V.N. performed statistical analysis and additional analysis. Yun Liu guided study design, analysis of the results and statistical analysis. S.G. studied the potential utility of the model. R.C.D. and D.C. initiated the project and led the overall development, with strategic guidance and executive support from G.S.C., L.H.P. and D.R.W. Yuan Liu, Yun Liu and S.J.H. prepared the manuscript with the assistance and feedback from all other co-authors. K.K. and S.J.H. performed the work at Google Health via Advanced Clinical. G.d.O.M. performed the work at Google Health via Adecco Staffing. N.S. performed the work at Google Health.

Corresponding author

Correspondence to Yun Liu.

Ethics declarations

Competing interests

K.K. and S.J.H. were consultants of Google LLC. R.H.-W. is an employee of the Medical University of Graz. G.d.O.M. is an employee of Adecco Staffing supporting Google LLC. This study was funded by Google LLC. The remaining authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. Yuan Liu, A.J., C.E., D.H.W., K.L., P.B., J.G., V.G., D.A., Yun Liu, R.C.D. and D.C. are inventors on a filed patent related to this work. The authors declare no other competing interests.

Additional information

Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of the deep learning system (DLS) and clinicians, broken down for each of the 26 categories of skin conditions and ‘other’.

a, Top-1 and top-3 sensitivity of the DLS on validation set A (n=3,756). b, Top-1 and top-3 sensitivity of the DLS and three types of clinicians: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set B (n=963). Numbers in parentheses in the x-axes indicate the number of cases. Detailed breakdown of each clinician and the DLS performance on the subset of cases graded by each clinician are in Supplementary Table 8. Error bars indicate 95% CI (see Statistical Analysis).

Extended Data Fig. 2 Performance of the deep learning system (DLS) and the clinicians on the 419-way classification: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set A (n=3,756) and validation set B (n=963).

a, Top-1 and top-3 accuracy for the DLS and clinicians across all cases and 419 categories of skin conditions. b, Average overlap (to assess the full differential diagnosis) of the DLS and clinicians. Error bars indicate 95% confidence intervals (see Statistical Analysis).

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–10 and Tables 1–13.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Jain, A., Eng, C. et al. A deep learning system for differential diagnosis of skin diseases. Nat Med 26, 900–908 (2020). https://doi.org/10.1038/s41591-020-0842-3

Download citation

Received: 11 September 2019
Accepted: 19 March 2020
Published: 18 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1038/s41591-020-0842-3

This article is cited by

Risk factors for scabies in hospital: a systematic review
- Dong-Hee Kim
- Yujin Kim
- MinWoo Kim
BMC Infectious Diseases (2024)
CNN-IKOA: convolutional neural network with improved Kepler optimization algorithm for image segmentation: experimental validation and numerical exploration
- Mohamed Abdel-Basset
- Reda Mohamed
- Ibrahim A. Hameed
Journal of Big Data (2024)
Artificial intelligence in the neonatal intensive care unit: the time is now
- Kristyn Beam
- Puneet Sharma
- Andrew L. Beam
Journal of Perinatology (2024)
Physician–machine partnerships boost diagnostic accuracy, but bias persists

Nature Medicine (2024)
Optimizing skin disease diagnosis: harnessing online community data with contrastive learning and clustering techniques
- Yue Shen
- Huanyu Li
- Junwei Lv
npj Digital Medicine (2024)