Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. This patient-level contextual information is frequently used by clinicians to diagnose melanoma and is especially useful in ruling out false positives in patients with many atypical nevi. The dataset represents 2,056 patients (20.8% with at least one melanoma, 79.2% with zero melanomas) from three continents with an average of 16 lesions per patient, consisting of 33,126 dermoscopic images and 584 (1.8%) histopathologically confirmed melanomas compared with benign melanoma mimickers.
melanoma • Skin Lesion
Dermoscopy • digital curation
approximate age • sex • anatomic site
Sample Characteristic - Organism
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13070345
Background & Summary
Artificial intelligence (AI) use in medical imaging is rapidly progressing and has the potential to reduce melanoma-associated mortality, morbidity, and healthcare costs by improving access to expertise, diagnostic accuracy, and screening efficiency1,2,3. Here we present a dermatology image dataset that includes patient- and lesion-related clinical context, which can be used in studies to examine whether this additional information further improves recognition performance.
Recent studies have demonstrated the ability of AI algorithms to match, if not outperform, clinicians in the diagnosis of individual skin lesion images in controlled reader studies. Algorithms derived from the 2018 ISIC Grand Challenge have been shown to outperform over 500 clinical readers and experts in such a reader study1. However, the reader study did not accurately reflect clinical scenarios where clinicians have access to examine all lesions on a patient.
Clinicians frequently assess skin lesions for biopsy by assessing them in context with the rest of the lesions on a given patient’s body, taking into consideration the individual “biologic skin ecosystem”. As demonstrated in Fig. 1, a lesion with malignancy-predictive features among many similar lesions is thought not to be as dangerous as an odd lesion on a patient whose other lesions are more benign looking. The latter is known in dermatology as the “ugly duckling sign” and is frequently used to diagnose melanoma, especially in patients with multiple melanocytic lesions4,5. Until now, the ugly duckling concept has not been explored with machine learning due to the lack of large datasets with multiple labeled images per patient. Here, we present the first dataset of melanoma and comparative lesions from the same patient to support new machine learning challenges. This dataset is composed of 33126 images collected from 2056 patients at multiple centers around the world such as Memorial Sloan Kettering Cancer Center, New York; the Melanoma Institute Australia and the Melanoma Diagnosis Centre, Sydney; the University of Queensland, Brisbane; the Medical University of Vienna, Vienna; and Hospital Clínic de Barcelona, Barcelona. In this article, we present the methods by which we created this multicenter dataset with clinical contextual information.
We queried clinical imaging databases across the six centers to generate a multicenter imaging dataset. Among patients with dermoscopy imaging from 1998 to 2020, those with multiple skin lesions were identified. Histopathology reports corresponding to internal biopsied lesions were reviewed for diagnosis labelling. Non-biopsied lesions that were monitored for at least six months were considered benign without further granularity6. Patients with appropriate qualifying diagnoses: melanoma or benign lesions that could be considered melanoma mimickers including nevi, atypical melanocytic proliferation, café-au-lait macule, lentigo NOS, lentigo simplex, solar lentigo, lichenoid keratosis, and seborrheic keratosis were included7,8,9. Lesions satisfying the described criteria were represented in the dataset with a single dermoscopic image8,10,11. These include images captured with or without polarized light using a contact or noncontact dermatoscope. When multiple image types were available, the selected image was either the one of highest resolution or if multiple images at the same resolution were available, one was chosen randomly. Images containing any potentially identifying features, such as jewelry or tattoos, or from patients without at least three qualifying images were excluded during quality assurance review.
In order to test algorithm generalizability, a subset of images were allocated for the testing dataset of the 2020 ISIC Grand Challenge12. These images included contributions from all sites included in the training set and a held out set of cases from the Andreas Syngros Hospital of Cutaneous & Venereal Diseases, Athens, Greece. These test images are available for download, but the test labels are not yet public due to planned future challenges and experiments.
A software annotation tool, called ‘Tagger,’ was developed internally to review diagnostic labeling of grouped images (https://github.com/dgutman/webix_image_organizer). Using this tool, dermoscopy expert reviewers (EC, OR) were presented sets of 30 images with a shared diagnosis in order to identify the ones with erroneous labeling. Reviewers invested 22 hours over three weeks of quality assurance in ‘Tagger’ and spent an average of 4 seconds per set when flagging a single image, and 11 seconds per set when flagging several images. Out of all images reviewed in Tagger, 2.7% were removed, out of concern for erroneous labels.
Memorial Sloan Kettering Cancer Center
The MSK Dermatology Service is a high-risk clinic that relies heavily on imaging for high risk individuals with or without a history of melanoma13. Images were acquired using a dermoscopic attachment to either a digital single reflex lens (SLR) camera or to a smartphone. Each lesion was imaged with polarized and/or nonpolarized dermoscopy. For each lesion, 3–5 images are collected during each patient visit and stored in a specialized image database called VectraTM (Canfield Scientific Inc., Parsippany, NJ, USA).
Images were extracted after searching the database for patients with multiple lesions imaged and who had biopsy confirmed melanoma from 2015–2019. The clearest image per time point was selected by medical student research fellows using a selection tool designed uniquely for the task (SM, JN, OR, EC) (https://github.com/ISIC-Research/lesionimagepicker). Images were collected and shared with institutional review board approval number 16–974.
Hospital Clínic Barcelona
The Department of Dermatology of the Hospital Clínic of Barcelona is a tertiary referral center for melanoma patients, includes a high-risk melanoma patient clinic. The dermatology department is equipped with the digital dermatoscopy system MoleMaxTM HD (Derma Medical Systems, Vienna, Austria) and corresponding image database to store the collected images. Each lesion was photographed using polarized dermoscopy.
Candidate images were extracted after searching the database for benign lesions with >1.5 years of digital dermoscopy follow-up and excised lesions with a histopathology report between the years 1998 and 2020. The images were examined by three expert dermatologists at the clinic for image quality assurance and label accuracy. From a series of multiple sequential images of the same nevus, we extracted the median timepoint. Images were collected and shared with institutional ethics approval number HCP/2019/0413.
The University of Queensland
The Clinical Research Facility of the Translational Research Institute in Brisbane, Queensland, Australia is the clinical trial site following both general population and high-risk individuals participating in studies carried out by the Dermatology Research Center of The University of Queensland Diamantina Institute. Contributed images came from three prospective longitudinal studies. The first study, “Changing Naevi Study”, consisted of two groups of participants; advanced stage (III – IV) melanoma patients undergoing treatment with immunotherapy and/or targeted therapy and people at high risk of developing melanoma due to personal or family history but were not undergoing treatment at time of enrollment. Ethics approval was obtained from the Human Research Ethics Committees of Metro South Health (HREC/16/QPAH/37) and The University of Queensland (2016000429). The second study, “Mind Your Moles”, consisted of participants from a general population cohort recruited from the Brisbane Electoral Role14. All nevi >5 mm were imaged in these two studies, as well as any lesions of interest/concern to the participant or clinician. This study has been approved by the Metro South Health Human Research Ethics Committee on April 21, 2016 (approval number: HREC/16/QPAH/125). Ethics approval has also been obtained from the University of Queensland Human Research Ethics Committee (approval number: 2016000554), Queensland University of Technology Human Research Ethics Committee (approval number: 1600000515), and QIMR Berghofer (approval number: P2271). The third study, “Evaluation of the Efficacy of 3D Total-Body Photography With Sequential Digital Dermoscopy in a High-Risk Melanoma Cohort”, consisted of participants at high risk of melanoma15, half of which underwent imaging intervention16. Lesions of interest to the participant or clinician were imaged dermoscopically. This study has received Human Research Ethics Committee (HREC) approval from both Metro South Health HREC (HREC/17/QPAH/816) and The University of Queensland HREC (2018000074).
All images used for the studies were extracted from the VectraTM image database and were captured between the years 2016 and 2020 (Canfield Scientific Inc., Parsippany, NJ, USA).
Medical University Vienna
The Early Recognition Unit of the Department of Dermatology of Medical University of Vienna is a tertiary referral center for high-risk patients. It offers total digital dermatoscopic follow-up to patients with multiple nevi17. Most patients in the program are of European descent with fair skin types (usually skin type 1–3) and have a high number of nevi and a personal or family history of melanoma.
We extracted polarized dermoscopic images from 2015–2019 which were stored in the MoleMax HD System (Derma Medical Systems, Vienna, Austria). We searched the database of this system for patients with at least 3 dermoscopic images by filtering SQL-tables with a proprietary tool provided by the manufacturer. From these patients we selected all benign melanocytic lesions with >1 year follow-up and all lesions that were excised. Histopathology reports were matched manually to all excised lesions. Non-melanocytic lesions, duplicate images, images captured before 2015 with older systems, low-quality images, and images that depicted only parts of the lesion were excluded. Furthermore, we excluded images of lesions that were already included in the 2018 or 2019 ISIC challenges. Images were collected and shared with institutional ethics approval number 1804/2017.
Melanoma Institute Australia and the Sydney Melanoma Diagnosis Centre
Both services are high-risk, tertiary referral dermatology clinics that rely heavily on imaging of individuals with or without a history of melanoma13. Lesions imaged for short term monitoring are selected at the discretion of the clinician or which are of concern to the patient. Additionally, all lesions are imaged prior to surgical removal. Images are acquired using a dermoscopic attachment to either a digital single reflex lens (SLR) camera or to a smartphone and stored in DermEngineTM (Metaoptima, Vancouver, British Columbia, Canada). Histopathology reports between 2008 and 2019 were reviewed and lesions followed for six months or more without malignant changes were considered benign. All images were manually reviewed to assure de-identification and image detail quality after database extraction. Images were collected and shared with institutional ethics approval number X20-0241 & 2020/ETH01411: Melanoma Image Annotation and Analysis Collaboration project.
The quality assurance and collection steps we performed for curating the images from various sources are detailed in Fig. 2.
Each lesion in the dataset is represented by a single image. The image of non-biopsied benign lesions with imaging at multiple time points were selected to minimize the difference in patient imaging date variability and date range between patients with and without an imaged melanoma. This was performed to reduce potential bias in image lighting, camera type, or other factors between the benign and melanoma patient class.
Lesion context images
Due to the retrospective nature of image acquisition and potential surveillance bias in different patient populations, the number of lesions per patient was not distributed identically between the class of patients with a melanoma image and the class without a melanoma image. Because the lesions in this dataset do not represent all lesions that exist on this set of patients, it is possible the imbalance is related to selection bias of imaged lesions. Lesions in both classes were subsampled through patient matching, which led to a loss of 4.1% of images. Ultimately, 50% of the patients have more than 10 contextual lesions. The matched number of images per patient ID before and after subsampling is shown in Fig. 3.
Due to a clerical error during the data ingestion process to the ISIC Archive, 425 pixelwise identical duplicate images were ingested and included in the dataset. The duplicates are included in the data to mirror the dataset used for the 2020 SIIM-ISIC Melanoma Classification competition. In order to preserve fidelity with the dataset that was used in the competition, the dataset itself has not been modified; however, lesion identifiers were made available as a metadata field and a comma-separated value file is available at the dataset landing page which highlights the image identifiers of the duplicates.
The dataset was made available for download through the Kaggle platform as part of a live competition from May 27, 2020 through August 20, 2020. It is released under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, and is permanently accessible to the public through the ISIC Archive12 at this https://doi.org/10.34970/2020-ds01. Currently no modifications have been made to the dataset, however, any metadata or image modifications will be noted at that DOI landing page.
Training images consisted of 12,743,090 pixels on average but ranged from 307,200 to 24,000,000. Metadata for each image included approximate patient age at time of image capture, biological sex, general anatomic site of the lesion, anonymized patient identification number, benign/malignant category, and the specific diagnosis if one was available based on an acceptable ground truth confirmation method. A summary of the characteristics of the dataset at patient- and lesion-level is shown in Table 1.
This dataset is adjoined by a test which determined the 2020 ISIC Grand Challenge leaderboard scoring. Test images and associated metadata are available for download through the ISIC Archive at the above listed DOI, though diagnostic labels remain undisclosed at this time until further notice to serve as the basis for scoring future competitions.
The dataset is available in two formats.
The first is the file format described in Part 10 of Digital Imaging and Communication in Medicine (DICOM) standard18,19, which is currently being developed for dermatology. The DICOM standard is a comprehensive, international medical image standard that was originally developed for radiology, where it has become ubiquitous as the core standard. It has since been adopted by many other medical imaging specialties including ophthalmology, dentistry, cardiology, nuclear medicine, oncology, pathology, surgical specialties who perform image-guided surgery (e.g., neurosurgery, ENT, orthopedics), and specialties that acquire endoscopic or laparoscopic imaging20. The DICOM file format is an amalgamation of the metadata and pixel data in a single file. The pixel data is encoded in Joint Photographic Expert Group (JPEG) format. While imaging technology changed over the time period from which images were selected and continue to change over time, the devices were not recorded upon image capture, and perhaps in the future when the DICOM standard is more widely adopted this may be metadata that could be collected and provided.
The second format is where the images are in JPEG format and the metadata is included in a linked comma-separated values (CSV) file.
The ground truth labels for all malignant lesions in the dataset were confirmed via retrospective review of histopathology reports, and diagnosis plausibility was visually confirmed by visual confirmation of a dermoscopy expert. Histopathology reports were double checked if the label was suspicious. Melanoma in situ and invasive melanoma were both coded as melanoma. All other qualifying images were coded as benign, including those diagnosed as severely dysplastic nevi21,22.
Non-biopsied lesions with expert consensus agreement and lesions followed for six months or more without malignant changes were labelled benign without a more specific diagnosis by most contributors. Dermatofibromas, seborrheic keratosis, or vascular lesions were not monitored, as that would not reflect clinical practice, but labels were verified visually by an expert in dermoscopy. Images of lesions were attributed to patients based on the clinical imaging database identification codes which are stored at the time of capture during each clinical photography session.
This dataset mimics clinical practice by labeling images from the same patient (mean = 16, median = 12, standard deviation = 16) as such and allows algorithms to assess multiple images from the same patient for malignancy. It addresses a particularly challenging area of clinical practice, those patients with multiple atypical nevi suspicious for malignancy. The dataset is designed to improve translational potential of algorithms, especially to help clinicians without access to tertiary referral centers assess high risk patients with multiple atypical nevi. Additionally, algorithms developed using this dataset may be better candidates for incorporating into dermatology imaging systems, as they can evaluate all images for a given patient in context, and perhaps even be used during clinic visits in which multiple lesions are imaged. Given the translational potential of algorithms developed using this dataset, we hope that generating a public, well annotated dataset that mimics clinical practice will lead to prospective studies of promising automated approaches for diagnosing melanoma.
Various forms of dermoscopy imaging are included in the dataset: contact non-polarized light, contact polarized light, and non-contact polarized light. Deeper skin structures are more often visible under polarized light than non-polarized light, even without direct skin contact with the interface or the use of a liquid interface23. Various colors, structures, and patterns are more pronounced, or accessible, under specific forms of dermoscopy24. Imaging modalities are not equivalent in identifying certain morphologies but complement one another in a holistic clinical assessment. A limitation to this dataset is that each lesion is represented by a single image and type of dermoscopy, which may not reflect the full spectrum of information that would be used by a clinician20.
Generalization of AI-assisted skin lesion classification to broad clinical use depends on the demographic agreement of the training dataset to the clinical population. Due to low population prevalence and challenges with access to care in different populations, the images gathered for large datasets such as this for AI classification have a strong tendency to under-represent darker skin types. This may lead to either overdiagnosis or underdiagnosis of melanomas in darker skin types, both of which would have significant clinical implications and will require prospective study. The ISIC Archive is actively pursuing methods by which to increase the diversity of images obtained, but at this point caution should be used when attempting to generalize algorithms trained on images from specialized referral centers (such as the dataset described herein) to the global population at large. The dataset is also enriched for melanoma in general and does not represent true incidence of melanoma.
Custom generated code for the described methods is available at https://github.com/ISIC-Research/2020-Challenge-Curation.
Tschandl, P., Argenziano, G., Razmara, M. & Yap, J. Diagnostic accuracy of content-based dermatoscopic image retrieval with deep classification features. Br J Dermatol. 181, 155–165 (2019).
Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).
Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat Med. 26, 1229–1234 (2020).
Gaudy-Marqueste, C. et al. Ugly Duckling Sign as a Major Factor of Efficiency in Melanoma Detection. JAMA Dermatol. 153, 279–284 (2017).
Scope, A. et al. The “Ugly Duckling” Sign: Agreement Between Observers. Arch Dermatol. 144, 58–64 (2008).
Moscarella, E. et al. Both short-term and long-term dermoscopy monitoring is useful in detecting melanoma in patients with multiple atypical nevi. J Eur Acad Dermatol Venereol. 31, 247–251 (2017).
Carrera, C. et al. Dermoscopic Clues for Diagnosing Melanomas That Resemble Seborrheic Keratosis. JAMA Dermatol. 153, 544–551 (2017).
Argenziano, G. et al. Early diagnosis of melanoma: what is the impact of dermoscopy? Dermatol Ther. 25, 403–409 (2012).
Kaminska-Winciorek, G. & Wydmański, J. Benign simulators of melanoma on dermoscopy – black colour does not always indicate melanoma. J Pre-Clin Clin Res. 7, 6–12 (2013).
Argenziano, G. et al. Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet. J Am Acad Dermatol. 48, 679–693 (2003).
Bafounta, M. L., Beauchet, A., Aegerter, P. & Saiag, P. Is dermoscopy (epiluminescence microscopy) useful for the diagnosis of melanoma? Results of a meta-analysis using techniques adapted to the evaluation of diagnostic tests. Arch Dermatol. 137, 1343–1350 (2001).
International Skin Imaging Collaboration. SIIM-ISIC 2020 Challenge Dataset. International Skin Imaging Collaboration https://doi.org/10.34970/2020-ds01 (2020).
Halpern, A. C., Marchetti, M. A. & Marghoob, A. A. Melanoma Surveillance in “High-Risk” Individuals. JAMA Dermatol. 150, 815–816 (2014).
Koh, U. et al. ‘Mind your Moles’ study: protocol of a prospective cohort study of melanocytic naevi. BMJ Open. 8 (2018).
Rastrelli, M., Tropea, S., Rossi, C. R. & Alaibac, M. Melanoma: epidemiology, risk factors, pathogenesis, diagnosis and classification. In Vivo. 28, 1005–1011 (2014).
Primiero, C. A. et al. Evaluation of the efficacy of 3D total-body photography with sequential digital dermoscopy in a high-risk melanoma cohort: protocol for a randomised controlled trial. BMJ Open. 9 (2019).
Rinner, C., Tschandl, P., Sinz, C. & Kittler, H. Long-term evaluation of the efficacy of digital dermatoscopy monitoring at a tertiary referral center. J Dtsch Dermatol Ges. 15, 517–522 (2017).
National Electrical Manufacturers Association. Digital Imaging and Communications in Medicine (DICOM) Supplement 221: Dermoscopy https://www.dicomstandard.org/News/current/docs/sups/sup221.pdf (2019).
National Electrical Manufacturers Association. Digital Imaging and Communications in Medicine (DICOM) Standard PS3.10 2018d - Media Storage and File Format for Media Interchange. http://dicom.nema.org/medical/dicom/2018d/output/html/part10.html (2018).
Caffery, L. J. et al. Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards. J Digit Imaging. 31, 568–577 (2018).
Lott, J. P. et al. Evaluation of the Melanocytic Pathology Assessment Tool and Hierarchy for Diagnosis (MPATH-Dx) classification scheme for diagnosis of cutaneous melanocytic neoplasms: Results from the International Melanoma Pathology Study Group. J Am Acad Dermatol. 75, 356–363 (2016).
Piepkorn, M. W. et al. The MPATH-Dx reporting schema for melanocytic proliferations and melanoma. J Am Acad Dermatol. 70, 131–141 (2014).
Marghoob, A. A. et al. Instruments and new technologies for the in vivo diagnosis of melanoma. J Am Acad Dermatol. 49, 777–797; quiz 798–799 (2003).
Benvenuto-Andrade, C. et al. Differences Between Polarized Light Dermoscopy and Immersion Contact Dermoscopy for the Evaluation of Skin Lesions. Arch Dermatol. 143, 329–338 (2007).
The dataset provided by The University of Queensland in Brisbane was funded by the National Health and Medical Research Council (NHMRC) – Centre of Research Excellence Scheme (APP 1099021). HPS holds an NHMRC MRFF Next Generation Clinical Researchers Program Practitioner Fellowship (APP1137127). Other funding sources include the Melanoma Research Alliance Young Investigator Award 614197. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30 CA008748.
H.P.S. is a shareholder of MoleMap N.Z. Limited and e-derm consult GmbH, and undertakes regular teledermatological reporting for both companies. H.P.S. is a Medical Consultant for Canfield Scientific Inc., Revenio Research Oy and also a Medical Advisor for First Derm. AH serves as a consultant to Canfield Scientific Inc., and is on the SciBase advisory panel.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rotemberg, V., Kurtansky, N., Betz-Stablein, B. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Sci Data 8, 34 (2021). https://doi.org/10.1038/s41597-021-00815-z
This article is cited by
Skin-Net: a novel deep residual network for skin lesions classification using multilevel feature extraction and cross-channel correlation with detection of outlier
Journal of Big Data (2023)
A dataset of skin lesion images collected in Argentina for the evaluation of AI tools in this population
Scientific Data (2023)
Nature Biomedical Engineering (2023)
Prospective validation of dermoscopy-based open-source artificial intelligence for melanoma diagnosis (PROVE-AI study)
npj Digital Medicine (2023)