The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Tschandl, Philipp; Rosendahl, Cliff; Kittler, Harald

doi:10.1038/sdata.2018.161

Download PDF

Data Descriptor
Open access
Published: 14 August 2018

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Scientific Data volume 5, Article number: 180161 (2018) Cite this article

87k Accesses
1627 Citations
23 Altmetric
Metrics details

Subjects

Abstract

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.

Design Type(s)	database creation objective • data integration objective • image format conversion objective
Measurement Type(s)	skin lesions
Technology Type(s)	digital curation
Factor Type(s)	diagnosis • Diagnostic Procedure • age • biological sex • animal body part
Sample Characteristic(s)	Homo sapiens • skin of body

Machine-accessible metadata file describing the reported data (ISA-Tab format)

BCN20000: Dermoscopic Lesions in the Wild

Article Open access 17 June 2024

A dataset of skin lesion images collected in Argentina for the evaluation of AI tools in this population

Article Open access 18 October 2023

A patient-centric dataset of images and metadata for identifying melanomas using clinical context

Article Open access 28 January 2021

Background & Summary

Dermatoscopy is a widely used diagnostic technique that improves the diagnosis of benign and malignant pigmented skin lesions in comparison to examination with the unaided eye¹. Dermatoscopic images are also a suitable source to train artificial neural networks to diagnose pigmented skin lesions automatically. In 1994, Binder et al.² already used dermatoscopic images successfully to train an artificial neural network to differentiate melanomas, the deadliest type of skin cancer, from melanocytic nevi. Although the results were promising, the study, like most earlier studies, suffered from a small sample size and the lack of dermatoscopic images other than melanoma or nevi. Recent advances in graphics card capabilities and machine learning techniques set new benchmarks with regard to the complexity of neural networks and raised expectations that automated diagnostic systems will soon be available that diagnose all kinds of pigmented skin lesions without the need of human expertise³.

Training of neural-network based diagnostic algorithms requires a large number of annotated images⁴ but the number of high quality dermatoscopic images with reliable diagnoses is limited or restricted to only a few classes of diseases.

In 2013 Mendonça et al. made 200 dermatoscopic images available as the PH2 dataset including 160 nevi and 40 melanomas⁵. Pathology was the ground truth for melanomas but not available for most nevi. Because the set is publicly available (http://www.fc.up.pt/addi/) and includes comprehensive metadata it served as a benchmark dataset for studies of the computer diagnosis of melanoma until now.

Accompanying the book Interactive Atlas of Dermoscopy⁶ a CD-ROM is commercially available with digital versions of 1044 dermatoscopic images including 167 images of non-melanocytic lesions, and 20 images of diagnoses not covered in the HAM10000 dataset. Although this is one of the most diverse available datasets in regard to covered diagnoses, its use is probably limited because of its constrained accessibility.

The ISIC archive (https://isic-archive.com/) is a collection of multiple databases and currently includes 13786 dermatoscopic images (as of February 12th 2018). Because of permissive licensing (CC-0), well-structured availability, and large size it is currently the standard source for dermatoscopic image analysis research. It is, however, biased towards melanocytic lesions (12893 of 13786 images are nevi or melanomas). Because this portal is the most comprehensive, technically advanced, and accessible resource for digital dermatoscopy, we will provide our dataset through the ISIC archive.

Because of the limitations of available datasets, past research focused on melanocytic lesions (i.e the differentiation between melanoma and nevus) and disregarded non-melanocytic pigmented lesions although they are common in practice. The mismatch between the small diversity of available training data and the variety of real life data resulted in a moderate performance of automated diagnostic systems in the clinical setting despite excellent performance in experimental settings^3,5,7,8. Building a classifier for multiple diseases is more challenging than binary classification⁹. Currently, reliable multi-class predictions are only available for clinical images of skin diseases but not for dermatoscopic images^10,11.

To boost the research on automated diagnosis of dermatoscopic images we released the HAM10000 (“Human Against Machine with 10000 training images”) dataset. The dataset will be provided to the participants of the ISIC 2018 classification challenge hosted by the annual MICCAI conference in Granada, Spain, but will also be available to research groups who do not participate in the challenge. Because we will also use this dataset to collect and provide information on the performance of human expert diagnosis, it could serve as benchmark set for the comparisons of humans and machines in the future. In order to provide more information to machine-learning research groups who intend to use the HAM10000 training set for research we describe the evolution and the specifics of the dataset (Fig. 1) in detail.

Methods

The 10015 dermatoscopic images of the HAM10000 training set were collected over a period of 20 years from two different sites, the Department of Dermatology at the Medical University of Vienna, Austria, and the skin cancer practice of Cliff Rosendahl in Queensland, Australia. The Australian site stored images and meta-data in PowerPoint files and Excel databases. The Austrian site started to collect images before the era of digital cameras and stored images and metadata in different formats during different time periods.

Extraction of images and meta-data from PowerPoint files

Each PowerPoint file contained consecutive clinical and dermatoscopic images of one calendar month of clinical workup, where each slide contained a single image and a text-field with a unique lesion identifier. Because of the large amount of data we applied an automated approach to extract and sort those images. We used the Python package python-pptx to access the PowerPoint files and to obtain the content. We iterated through each slide and automatically extracted and stored the source image, the corresponding identifier, and the year of documentation, which was part of the file name.

Digitization of diapositives

Before the introduction of digital cameras, dermatoscopic images at the Department of Dermatology in Vienna, Austria were stored as diapositives. We digitized the diapositives with a Nikon Coolscan 5000 ED scanner with a two-fold scan with Digital ICE and stored files as JPEG Images (8-bit color depth) in highest quality (300DPI; 15×10 cm). We manually cropped the scanned images with the lesion centered to 800x600px at 72DPI, and applied manual histogram corrections to enhance visual contrast and color reproduction (Fig. 2).

**Figure 2: Manual correction of a scanned diapositive.**

Extraction of data from a digital dermatoscopy system

The Department of Dermatology at the University of Vienna is equipped with the digital dermatoscopy system MoleMax HD (Derma Medical Systems, Vienna, Austria). We extracted cases from this system by filtering SQL-tables with a proprietary tool provided by the manufacturer. We selected only non-melanocytic lesions with a consensus benign diagnosis, nevi with >1.5 years of digital dermatoscopic follow-up, and excised lesions with a histopathologic report. Histopathologic reports were matched manually to specific lesions. From a series of multiple sequential images of the same nevus we extracted only the most recent one. Some melanomas of this set were also photographed with a DermLiteTM FOTO (3GenTM) camera. These additional images became also part of the ViDIR image series, where different images of the same lesion were labeled with a common identifier string. Original images of the MoleMax HD system had a resolution of 1872x1053px (MoleMax HD) with non-quadratic pixels. We manually cropped all MoleMax HD images to 800x600px (72DPI), centered the lesion if necessary, and reverted the format to quadratic pixels.

Filtering of dermatoscopic images

The source image collections of both sites contained not only dermatoscopic images but also clinical close-ups and overviews. Because there was no reliable annotation of the imaging type, we had to separate the dermatoscopic images from the others. To deal with the large amount of data efficiently we developed an automated method to screen and categorize >30000 images, similar to Han et al¹²: We hand-labeled 1501 image files of the Australian collection into the categories "overviews", "close-ups" and "dermatoscopy". Using the hand-labeled images as a training set, we fine-tuned an InceptionV3-architecture¹³ (weights pre-trained on ImageNet⁴ data) to classify the images according to image type. After training for 20 epochs with Stochastic Gradient Descent, with a learning rate initialized at 0.0003, step-down (Gamma 0.1) at epochs 7 and 13, and a batch-size of 64, we obtained a top-1 accuracy of 98.68% on our hold-out test set. This accuracy was sufficient to accelerate the selection process of dermatoscopic images. The few remaining misclassified close-ups and overviews were removed by hand in a second revision.

Unifying pathologic diagnoses

Histopathologic diagnoses showed high variability within and between sites including typos, different dermatopathologic terminologies, multiple diagnoses per lesion or uncertain diagnoses. Cases with uncertain diagnoses and collisions were excluded except for melanomas in association with a nevus.

We unified the diagnoses and formed seven generic classes, and specifically avoided ambiguous classifications. The histopathologic expression "superficial spreading melanoma in situ, arising in a preexisting dermal nevus", for example, should only be allocated to the "melanoma" class and not to the nevus class. The seven generic classes were chosen for simplicity and in regard of the intended use as a benchmark dataset for the diagnosis of pigmented lesions by humans and machines. The seven classes covered more than 95% of all pigmented lesions examined in daily clinical practice of the two study sites. A more detailed description of the disease classes is given in the usage notes below.

Manual quality review

A final manual screening and validation round was performed on all images to exclude cases with the following attributes:

Type: Close-up and overview images that were not removed with automatic filtering

Identifiability: Images with potentially identifiable content such as garment, jewelry or tattoos

Quality: Images that were out of focus or had disturbing artifacts like obstructing gel bubbles. We specifically tolerated the presence of terminal hairs.

Content: Completely non-pigmented lesions and ocular, subungual or mucosal lesions

Remaining cases were reviewed for appropriate color reproduction and luminance and, if necessary, corrected via manual histogram correction.

Code availability

Custom generated code for the described methods is available at https://github.com/ptschandl/HAM10000_dataset.

Data Records

All data records of the HAM10000 dataset are deposited at the Harvard Dataverse (Data Citation 1). Table 1 shows a summary of the number of images in the HAM10000 training-set according to diagnosis in comparison to existing databases.

Table 1 Summary of publicly available dermatoscopic image datasets in comparison to HAM10000.

Full size table

Images and metadata are also accessible at the public ISIC-archive through the archive gallery as well as through standardized API-calls (https://isic-archive.com/api/v1).

Rosendahl image set (Australia)

Lesions of this part of the HAM10000 dataset originate from the office of the skin cancer practice of Cliff Rosendahl (CR, School of Medicine, University of Queensland). We extracted data from his database after institutional ethics board approval (University of Queensland, Protocol-No. 2017001223). Images were solely taken by author CR with either a DermLite Fluid (non-polarized) or DermLite DL3 (3Gen, San Juan Capistrano, CA, USA) with immersion fluid (either 70% ethanol hand-wash gel, or ultrasound gel). Digital images were stored within PowerPoint^TM presentations. Each slide contained a dermatoscopic image and a text-field with a consecutive unique lesion ID linking the image to a separate Excel database with clinical metadata and histopathologic diagnoses. Images were documented in this manner consecutively starting 2008 until May 2017. The Rosendahl series consists of 34.2GB of digital slides (122 PowerPoint^TM-files) from which we extracted 36802 images with a matching histopathologic report as described above. After removal of non-pigmented lesions, overviews, close-ups without dermatoscopy, and cases without or with an inappropriate diagnosis that did not fall into one of the predefined generic classes we arrived at the final dataset described in Table 1.

ViDIR image set (Austria)

From the ViDIR Group (Department of Dermatology at the Medical University of Vienna, Austria) data-sources from different times were available and processed after ethics committee approval at the Medical University of Vienna (Protocol-No. 1804/2017).

Legacy Diapositives

The oldest resource of images of the ViDIR group dates back to the era before the availablility of digital cameras when dermatoscopic images were taken with analog cameras and archived as diapositives. They were originally photographed with the Heine Dermaphot system using immersion fluid, and produced for educational and archival purposes with the E-6 method¹⁴.

Current Database

Since 2005 we documented digital dermatoscopic images with a DermLiteTM FOTO (3GenTM) system (single cases also with Heine Delta 20) and stored images and meta-data in a central database. This dataset includes triplets from the same lesion taken with different magnifications to enable visualization of local features and general patterns.

MoleMax Series

The Department of Dermatology in Vienna offers digital dermatoscopic follow-up to high risk patients to increase the the number of excisions of early melanoma while decreasing unnecessary excisions of nevi¹⁵. The time interval between follow-up visits usually ranges from 6-12 months. Rinner et al.¹⁶ recently published a detailed description of this follow-up program. Since 2015 the MoleMax HD System (Derma Medical Systems, Vienna, Austria) is used for acquisition and storage of dermatoscopic images.Most patients in the follow-up program have multiple nevi. Dermatologists of the Department of Dermatology in Vienna usually monitor "atypical" nevi but also a small number of randomly selected inconspicuous nevi, which are commonly underrepresented in datasets used for machine-learning.

Technical Validation

If necessary, an expert dermatologist performed manual histogram correction to adjust images for visual inspection by human readers. We restricted corrections to underexposed images and images with a visible yellow or green hue and made no other adjustments. To illustrate changes, we conducted color illuminant estimations after corrections according to the grey-world assumption¹⁷ as shown in Fig. 3. As desired, corrections shifted yellow and green illuminants towards blue and red. We defined four different types of ground truth:

**Figure 3: Illuminant color estimation of the dataset.**

Histopathology

Histopathologic diagnoses of excised lesions have been performed by specialized dermatopathologists. We scanned all available histopathologic slides of the current ViDIR image set for later review. We manually reviewed all images with the corresponding histopathologic diagnosis and checked for plausibility. If the histopathologic diagnosis was implausible we checked for sample mismatch and reviewed the report and reexamined the slide if necessary. We excluded cases with ambiguous histopathologic diagnoses (for example: "favor nevus but cannot rule out evolving melanoma in situ").

Confocal

Reflectance confocal microscopy is an in-vivo imaging technique with a resolution at near-cellular level¹⁸, and some facial benign keratoses were verified by this method. Most cases were included in a prospective confocal study conducted at the Department of Dermatology at the Medical University of Vienna that also included follow-up for one year¹⁹.

Follow-up

If nevi monitored by digital dermatoscopy did not show any changes during 3 follow-up visits or 1.5 years we accepted this as evidence of biologic benignity. Only nevi, but no other benign diagnoses were labeled with this type of ground-truth because dermatologists usually do not monitor dermatofibromas, seborrheic keratoses, or vascular lesions. The presence of change was assessed by author HK who has more than 20 years of experience in digital dermatoscopic follow-up.

Consensus

For typical benign cases without histopathology or follow-up we provide an expert-consensus rating of authors PT and HK. We applied the consensus label only if both authors independently gave the same unequivocal benign diagnosis. Lesions with this type of ground-truth were usually photographed for educational reasons and did not need further follow-up or biopsy for confirmation.

Usage Notes

The HAM10000 training set includes pigmented lesions from different populations. The Austrian image set consists of lesions of patients referred to a tertiary European referral center specialized for early detection of melanoma in high risk groups. This group of patients often have a high number of nevi and a personal or family history of melanoma. The Australian image set includes lesions from patients of a primary care facility in a high skin cancer incidence area. Australian patients are typified by severe chronic sun damage. Chronic sun damaged skin is characterized by multiple solar lentigines and ectatic vessels, which are often present in the periphery of the target lesion. Very rarely also small angiomas and seborrheic keratoses may collide with the target lesion. We did not remove this "noise" and we also did not remove terminal hairs because it reflects the situation in clinical practice. In most cases, albeit not always, the target lesion is in the center of the image. Dermatoscopic images of both study sites were taken by different devices using polarized and non-polarized dermatoscopy. The set includes representative examples of pigmented skin lesions that are practically relevant. More than 95% of all lesion encountered during clinical practice will fall into one of the seven diagnostic categories. In practice, the task of the clinician is to differentiate between malignant and benign lesions, but also to make specific diagnoses because different malignant lesions, for example melanoma and basal cell carcinoma, may be treated in a different way and timeframe. With the exception of vascular lesions, which are pigmented by hemoglobin and not by melanin, all lesions have variants that are completely devoid of pigment (for example amelanotic melanoma). Non-pigmented lesions, which are more diverse and have a larger number of possible differential diagnoses, are not part of this set.

The following description of diagnostic categories is meant for computer scientists who are not familiar with the dermatology literature:

akiec

Actinic Keratoses (Solar Keratoses) and Intraepithelial Carcinoma (Bowen’s disease) are common non-invasive, variants of squamous cell carcinoma that can be treated locally without surgery. Some authors regard them as precursors of squamous cell carcinomas and not as actual carcinomas. There is, however, agreement that these lesions may progress to invasive squamous cell carcinoma – which is usually not pigmented. Both neoplasms commonly show surface scaling and commonly are devoid of pigment. Actinic keratoses are more common on the face and Bowen’s disease is more common on other body sites. Because both types are induced by UV-light the surrounding skin is usually typified by severe sun damaged except in cases of Bowen’s disease that are caused by human papilloma virus infection and not by UV. Pigmented variants exist for Bowen’s disease²⁰ and for actinic keratoses²¹, and both are included in this set.

The dermatoscopic criteria of pigmented actinic keratoses and Bowen’s disease are described in detail by Zalaudek et al.^22,23 and by Cameron et al.²⁰.

bcc

Basal cell carcinoma is a common variant of epithelial skin cancer that rarely metastasizes but grows destructively if untreated. It appears in different morphologic variants (flat, nodular, pigmented, cystic), which are described in more detail by Lallas et al.²⁴.

bkl

"Benign keratosis" is a generic class that includes seborrheic keratoses ("senile wart"), solar lentigo - which can be regarded a flat variant of seborrheic keratosis - and lichen-planus like keratoses (LPLK), which corresponds to a seborrheic keratosis or a solar lentigo with inflammation and regression²⁵. The three subgroups may look different dermatoscopically, but we grouped them together because they are similar biologically and often reported under the same generic term histopathologically. From a dermatoscopic view, lichen planus-like keratoses are especially challenging because they can show morphologic features mimicking melanoma²⁶ and are often biopsied or excised

for diagnostic reasons. The dermatoscopic appearance of seborrheic keratoses varies according to anatomic site and type²⁷.

df

Dermatofibroma is a benign skin lesion regarded as either a benign proliferation or an inflammatory reaction to minimal trauma. The most common dermatoscopic presentation is reticular lines at the periphery with a central white patch denoting fibrosis²⁸.

nv

Melanocytic nevi are benign neoplasms of melanocytes and appear in a myriad of variants, which all are included in our series. The variants may differ significantly from a dermatoscopic point of view. In contrast to melanoma they are usually symmetric with regard to the distribution of color and structure²⁹.

mel

Melanoma is a malignant neoplasm derived from melanocytes that may appear in different variants. If excised in an early stage it can be cured by simple surgical excision. Melanomas can be invasive or non-invasive (in situ). We included all variants of melanoma including melanoma in situ, but did exclude non-pigmented, subungual, ocular or mucosal melanoma.

Melanomas are usually, albeit not always, chaotic, and some melanoma specific criteria depend on anatomic site^23,30.

vasc

Vascular skin lesions in the dataset range from cherry angiomas to angiokeratomas³¹ and pyogenic granulomas³². Hemorrhage is also included in this category.

Angiomas are dermatoscopically characterized by red or purple color and solid, well circumscribed structures known as red clods or lacunes.

The number of images in the datasets does not correspond to the number of unique lesions, because we also provide images of the same lesion taken at different magnifications or angles (Fig. 4), or with different cameras. This should serve as a natural data-augmentation as it shows random transformations and visualizes both general and local features.

**Figure 4: Example triplet of the same lesion.**

Images in the dataset may diverge from those an end-user, especially lay persons, would provide in a real-world scenario. For example, insufficiently magnified images of small lesions or out of focus images were removed.

We used automated screening by neural networks, multiple manual reviews and cleared EXIF-data of the images to remove any potentially identifiable information. Data can thus be regarded anonymized to the best of our knowledge and the data collection was approved by the ethics review committee at the Medical University of Vienna and the University of Queensland.

Additional information

How to cite this article: Tschandl, P. et al. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5:180161 doi: 10.1038/sdata.2018.161 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Rosendahl, C., Tschandl, P., Cameron, A. & Kittler, H. Diagnostic accuracy of dermatoscopy for melanocytic and nonmelanocytic pigmented lesions. J Am Acad Dermatol 64, 1068–1073 (2011).
Article PubMed Google Scholar
Binder, M. et al. Application of an artificial neural network in epiluminescence microscopy pattern analysis of pigmented skin lesions: a pilot study. Br J Dermatol 130, 460–465 (1994).
Article CAS PubMed Google Scholar
Codella, N. C. F. et al. Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC). Preprint at https://arxiv.org/abs/1710.05006 (2017).
Deng, J. et al. ImageNet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 248–255 (2009).
Mendonça, T., Ferreira, P. M., Marques, J. S., Marcal, A. R. S. & Rozeira, J. PH2 - A dermoscopic image database for research and benchmarking, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, 2013, pp. 5437–5440 (2013).
Argenziano, G. et al. Interactive Atlas of Dermoscopy (Edra Medical Publishing and New Media: Milan, 2000).
Google Scholar
Dreiseitl, S., Binder, M., Hable, K. & Kittler, H. Computer versus human diagnosis of melanoma: evaluation of the feasibility of an automated diagnostic system in a prospective clinical trial. Melanoma Res 19, 180–184 (2009).
Article PubMed Google Scholar
Kharazmi, P., Kalia, S., Lui, H., Wang, Z. J. & Lee, T. K. A feature fusion system for basal cell carcinoma detection through data-driven feature learning and patient profile. Skin Res Technol 24, 256–264 (2017).
Article PubMed Google Scholar
Sinz, C. et al. Accuracy of dermatoscopy for the diagnosis of nonpigmented cancers of the skin. J Am Acad Dermatol 77, 1100–1109 (2017).
Article PubMed Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS ADS PubMed PubMed Central Google Scholar
Han, S. S. et al. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol, Preprint at https://doi.org/10.1016/j.jid.2018.01.028 (2018).
Han, S. S. et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS ONE 13, 1–14 (2018).
CAS Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the Inception Architecture for Computer Vision. Preprint at https://arxiv.org/abs/1512.00567 (2015).
Kodak professional chemicals, Process E-6 and Process E-6AR https://125px.com/docs/techpubs/kodak/j83-2005_11.pdf (2005).
Salerni, G. et al. Meta-analysis of digital dermoscopy follow-up of melanocytic skin lesions: a study on behalf of the International Dermoscopy Society. J Eur Acad Dermatol Venereol 27, 805–814 (2013).
Article CAS PubMed Google Scholar
Rinner, C., Tschandl, P., Sinz, C. & Kittler, H. Long-term evaluation of the efficacy of digital dermatoscopy monitoring at a tertiary referral center. J Dtsch Dermatol Ges 15, 517–522 (2017).
PubMed Google Scholar
Van de Weijer, J., Gevers, T. & Gijsenij, A. Edge-Based Color Constancy. IEEE Trans Image Processing 16, 2207–2214 (2007).
Article ADS MathSciNet Google Scholar
Stevenson, A. D., Mickan, S., Mallett, S. & Ayya, M. Systematic review of diagnostic accuracy of reflectance confocal microscopy for melanoma diagnosis in patients with clinically equivocal skin lesions. Dermatol Pract Concept 3, 19–27 (2013).
Article PubMed PubMed Central Google Scholar
Wurm, E. et al. The value of reflectance confocal microscopy in diagnosis of flat pigmented facial lesions: a prospective study. J Eur Acad Dermatol Venereol 31, 1349–1354 (2017).
Article CAS PubMed Google Scholar
Cameron, A., Rosendahl, C., Tschandl, P., Riedl, E. & Kittler, H. Dermatoscopy of pigmented Bowen’s disease. J Am Acad Dermatol 62, 597–604 (2010).
Article PubMed Google Scholar
Akay, B. N., Kocyigit, P., Heper, A. O. & Erdem, C. Dermatoscopy of flat pigmented facial lesions: diagnostic challenge between pigmented actinic keratosis and lentigo maligna. Br J Dermatol 163, 1212–1217 (2010).
Article CAS PubMed Google Scholar
Zalaudek, I. et al. Dermatoscopy of facial actinic keratosis, intraepidermal carcinoma, and invasive squamous cell carcinoma: a progression model. J. Am. Acad. Dermatol. 66, 589–597 (2012).
Article PubMed Google Scholar
Tschandl, P., Rosendahl, C. & Kittler, H. Dermatoscopy of flat pigmented facial lesions. J Eur Acad Dermatol Venereol 29, 120–127 (2015).
Article CAS PubMed Google Scholar
Lallas, A. et al. The dermatoscopic universe of basal cell carcinoma. Dermatol Pract Concept 4, 11–24 (2014).
Article PubMed PubMed Central Google Scholar
Zaballos, P. et al. Studying regression of seborrheic keratosis in lichenoid keratosis with sequential dermoscopy imaging. Dermatology 220, 103–109 (2010).
Article PubMed Google Scholar
Moscarella, E. et al. Lichenoid keratosis-like melanomas. J Am Acad Dermatol 65, e85, Van de (2011).
Article PubMed Google Scholar
Braun, R. P. et al. Dermoscopy of pigmented seborrheic keratosis: a morphological study. Arch Dermatol 138, 1556–1560 (2002).
Article PubMed Google Scholar
Zaballos, P., Puig, S., Llambrich, A. & Malvehy, J. Dermoscopy of dermatofibromas: a prospective morphological study of 412 cases. Arch Dermatol 144, 75–83 (2008).
PubMed Google Scholar
Rosendahl, C., Cameron, A., McColl, I. & Wilkinson, D. Dermatoscopy in routine practice - ’chaos and clues’. Aust Fam Physician 41, 482–487 (2012).
PubMed Google Scholar
Schiffner, R. et al. Improvement of early recognition of lentigo maligna using dermatoscopy. J. Am. Acad. Dermatol. 42, 25–32 (2000).
Article CAS PubMed Google Scholar
Zaballos, P. et al. Dermoscopy of solitary angiokeratomas: a morphological study. Arch Dermatol 143, 318–325 (2007).
PubMed Google Scholar
Zaballos, P. et al. Dermoscopy of pyogenic granuloma: a morphological study. Br J Dermatol 163, 1229–1237 (2010).
Article CAS PubMed Google Scholar

Data Citations

Tschandl, P. Harvard Dataverse https://doi.org/10.7910/DVN/DBW86T (2018)

Download references

Acknowledgements

We want to thank Andreas Ebner (photographer at the Department of Dermatology, Medical University of Vienna) for archival and acquisition of most ViDIR dataset images. We would further thank Giuliana Petronio for scanning the diapositives and Christoph Sinz, M.D. for invaluable help on continuously curating the ViDIR databases. We are grateful to our collaborators of the ISIC-Archive and ISIC 2018 challenge: M. Emre Celebi Ph.D. (University of Central Arkansas, Arkansas, USA), Noel C. F. Codella Ph.D. (IBM Research, New York, USA), Kristin Dana Ph.D. (Rutgers University, New Jersey, USA), David Gutman M.D. (Emory University, Georgia, USA), Allan Halpern M.D. (Memorial Sloan Kettering Cancer Center, New York, USA) and Brian Helba (Kitware, New York, USA). They made hosting and immediate application of the dataset possible and provided valuable input during workup of the data.

Author information

Authors and Affiliations

Department of Dermatology, ViDIR Group, Medical University of Vienna, Vienna 1090, Austria
Philipp Tschandl & Harald Kittler
Faculty of Medicine, University of Queensland, Herston 4006, Austria
Cliff Rosendahl

Authors

Philipp Tschandl
View author publications
You can also search for this author in PubMed Google Scholar
Cliff Rosendahl
View author publications
You can also search for this author in PubMed Google Scholar
Harald Kittler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.T. wrote the Data Descriptor, and performed data handling, image processing and analysis, expert annotation and quality review. C.R. wrote the Data Descriptor, and collected all cases from the Rosendahl-Series. H.K. wrote the Data Descriptor, and performed image annotation, extraction of MoleMax Series, and quality review of all images.

Corresponding author

Correspondence to Philipp Tschandl.

Ethics declarations

Competing interests

The authors declare no competing interests.

ISA-Tab metadata

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

Reprints and permissions

About this article

Cite this article

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 5, 180161 (2018). https://doi.org/10.1038/sdata.2018.161

Download citation

Received: 25 April 2018
Accepted: 26 June 2018
Published: 14 August 2018
DOI: https://doi.org/10.1038/sdata.2018.161

Subjects

Abstract

Similar content being viewed by others

BCN20000: Dermoscopic Lesions in the Wild

A dataset of skin lesion images collected in Argentina for the evaluation of AI tools in this population

A patient-centric dataset of images and metadata for identifying melanomas using clinical context

Background & Summary

Methods

Extraction of images and meta-data from PowerPoint files

Digitization of diapositives

Extraction of data from a digital dermatoscopy system

Filtering of dermatoscopic images

Unifying pathologic diagnoses

Manual quality review

Code availability

Data Records

Rosendahl image set (Australia)

ViDIR image set (Austria)

Legacy Diapositives

Current Database

MoleMax Series

Technical Validation

Histopathology

Confocal

Follow-up

Consensus

Usage Notes

akiec

bcc

bkl

df

nv

mel

vasc

Additional information

References

References

Data Citations

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

ISA-Tab metadata

ISA-Tab metadata

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links