Dermatologists diagnose a wide range of skin disease, ranging from skin cancers to inflammatory conditions like atopic dermatitis and psoriasis. However, globally, an estimated 3 billion people have inadequate access to medical care for skin disease. Even in the United States, there is a shortage and unequal access to dermatologists leading to long wait times. One proposed solution is the use of artificial intelligence (AI) tools, which could help with triaging skin diseases and identifying individuals with potential skin diseases. There has been rapid development of algorithms which claim to detect cutaneous malignancies over the last few years. However, we recently published a review of AI datasets for dermatology and found that most datasets are not publicly available, lack important information about dataset diversity, and have noisy diagnostic labels. Currently, publicly available datasets lack biopsy-proven skin lesions in dark skin tones.
In order to train and test AI algorithms in dermatology, we need diverse, validated benchmarks. We curated the Diverse Dermatology Images (DDI) dataset to meet this need—the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.
Clinical application: Given the long wait time to see a dermatologist, AI algorithms could help triage benign versus malignant lesions. However, it is important to have expertly labeled data that represents diverse skin tones in order to make sure that algorithms perform fairly across all groups.
Labeling: The images included in the DDI dataset were retrospectively selected from reviewing pathology reports in Stanford Clinics from 2010-2020 with further details in our paper. There are 656 images representing 570 unique patients. Each image label was expertly curated: skin tone was labeled based on in-person evaluation at the clinic visit cross-referenced against demographic photos and review of the clinical images by two board certified dermatologists. Each diagnosis was based on pathology reports from biopsy: these reports and the corresponding image was reviewed by a board certified dermatologist and dermatopathologist.
Skin tone comparison: The dataset comprised a retrospective convenience sample across all images of Fitzpatrick I-VI but was also designed to allow direct comparison between Fitzpatrick I-II and Fitzpatrick V-VI by matching diagnostic category, age within 10 years, gender, and date of photograph within 3 years. The images are not meant to be text book examples but rather represent the kind of clinical photos that AI algorithms may encounter in practice. This design allows us to evaluate previously developed state-of-the-art diagnostic algorithms across skin tones. During the de-identification process prior to data release, some of the images were cropped further to protect patient privacy. However, the main lesions were preserved during this process.
Further description of the dataset is available in our NeurIPS Machine Learning 4 Health workshop extended abstract.
Model code is available here. It includes a PyTorch custom dataset for DDI as well as our evaluation code.
1. Permission is granted to view and use the Diverse Dermatology Images Dataset without charge for personal, non-commercial research purposes only. Any commercial use, sale, or other monetization is prohibited.
2. Other than the rights granted herein, the Stanford University School of Medicine (“School of Medicine”) retains all rights, title, and interest in the Diverse Dermatology Images Dataset.
3. You may make a verbatim copy of the Diverse Dermatology Images Dataset for personal, non-commercial research use as permitted in this Research Use Agreement. If another user within your organization wishes to use the Diverse Dermatology Images Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.
4. YOU MAY NOT DISTRIBUTE, PUBLISH, OR REPRODUCE A COPY of any portion or all of the Diverse Dermatology Images Dataset to others without specific prior written permission from the School of Medicine.
5. YOU MAY NOT SHARE THE DOWNLOAD LINK to the Diverse Dermatology Images dataset to others. If another user within your organization wishes to use the Diverse Dermatology Images Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.
6. You must not modify, reverse engineer, decompile, or create derivative works from the Diverse Dermatology Images Dataset. You must not remove or alter any copyright or other proprietary notices in the Diverse Dermatology Images Dataset.
7. The Diverse Dermatology Images Dataset has not been reviewed or approved by the Food and Drug Administration, and is for non-clinical, Research Use Only. In no event shall data or images generated through the use of the Diverse Dermatology Images Dataset be used or relied upon in the diagnosis or provision of patient care.
8. THE Diverse Dermatology Images DATASET IS PROVIDED "AS IS," AND STANFORD UNIVERSITY AND ITS COLLABORATORS DO NOT MAKE ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, NOR DO THEY ASSUME ANY LIABILITY OR RESPONSIBILITY FOR THE USE OF THIS Diverse Dermatology Images DATASET.
9. You will not make any attempt to re-identify any of the individual data subjects. Re-identification of individuals is strictly prohibited. Any re-identification of any individual data subject shall be immediately reported to the School of Medicine.
10. Any violation of this Research Use Agreement or other impermissible use shall be grounds for immediate termination of use of this Diverse Dermatology Images Dataset. In the event that the School of Medicine determines that the recipient has violated this Research Use Agreement or other impermissible use has been made, the School of Medicine may direct that the undersigned data recipient immediately return all copies of the Diverse Dermatology Images Dataset and retain no copies thereof even if you did not cause the violation or impermissible use.
In consideration for your agreement to the terms and conditions contained here, Stanford grants you permission to view and use the Diverse Dermatology Images Dataset for personal, non-commercial research. You may not otherwise copy, reproduce, retransmit, distribute, publish, commercially exploit or otherwise transfer any material.
You may use Diverse Dermatology Images Dataset for legal purposes only.
You agree to indemnify and hold Stanford harmless from any claims, losses or damages, including legal fees, arising out of or resulting from your use of the Diverse Dermatology Images Dataset or your violation or role in violation of these Terms. You agree to fully cooperate in Stanford’s defense against any such claims. These Terms shall be governed by and interpreted in accordance with the laws of California.
For inquiries, contact us at email@example.com.