Background & Summary

Digital mammography (DM) is the gold standard imaging modality for early detection of breast cancer. However, limitations exist in patients with dense breasts as its overall sensitivity decreases1. Contrast-enhanced spectral mammography (CESM) is a contrast-based digital mammogram that has been approved by the Food and Drug Administration (FDA) in 2011 to be used as an adjunct to DM and ultrasound examinations for localization and characterization of occult or inconclusive lesions. Dual-energy image acquisition is performed where low and high-energy images are obtained. Several studies proved that low-energy images obtained appear like the standard DM images and are non-inferior to them2. High-energy images are non-interpretable; to overcome this, low and high-energy images are recombined and subtracted through appropriate image processing to suppress the background breast parenchyma after the acquisition. Figure 1 shows the resulting subtracted images obtained for interpretation, revealing contrast enhancement areas in a suppressed breast tissue background. Findings could be identified according to their density, morphologic, and enhancement characteristics3. However, estimating whether a lesion is benign or malignant without being seen by a radiologist is challenging due to the significant variation in the lesions’ visual characteristics4.

Fig. 1
figure 1

(a) Low-energy, (b) High-energy, and (c) Subtracted image.

Computer-aided detection (CAD) systems were introduced in the early 2000’s to help radiologists interpret mammography images. However, this proved to be challenging in clinical practice due to the increased rate of false positives marked by the CAD systems, which can distract the radiologists5. Currently, the use of artificial intelligence (AI) in radiology is still in its early stages. Nonetheless, algorithms that analyze pixel data distinguish patterns from images that might not have been previously identified even by expert radiologists6. Deep learning (DL) has a promising potential in performing many tasks such as automatically detecting lesions and helping radiologists provide a more accurate diagnosis. Moreover, new multimodal DL models like the perceiver7 make it feasible to train on large datasets and extract good unsupervised image representations that can be used on a wide range of tasks. However, fully annotated and large-sized datasets are required and will be crucial for training new DL networks or fine-tuning existing pre-trained DL networks and evaluating them. This is why it is important for radiologists to understand the impact of these machine-learning (ML) based analytical tools and recognize how they might influence and change the radiological practice soon8.

In the past couple of years, a small number of public mammography datasets were released, including the Digital Database for Screening Mammography (DDSM)9, the Image Retrieval in Medical Applications (IRMA) project10, the Mammographic Imaging Analysis Society (MIAS) database11, and the Curated Breast Imaging Subset of DDSM (CBIS-DDSM)012. These datasets contain DM images only, and none include CESM images.

In this paper, we present a CESM categorized dataset that provides easily-accessible low energy images with corresponding subtracted CESM images, abnormality segmentation annotation, verified medical reports, and pathological diagnosis for all cases. It will add to the ongoing advancements in future mammography DL-based systems. We also propose a new DL-based technique to automatically segment the abnormal findings in the images without intervention from radiologists, as segmentation annotation is a time-consuming task.

Methods

We collected and reformatted the data into an easily-accessible format. Figure 2 displays the flow diagram of the process to prepare our dataset: image preprocessing, manual annotations, and the automatic segmentation.

Fig. 2
figure 2

Flow diagram of the preparation of (CDD-CESM) and the deep learning method to automatically generate the segmentation annotation.

Technique of contrast enhanced mammography examination

CESM is done using the standard DM equipment but with additional software that performs dual-energy image acquisition. Two minutes after intravenously injecting the patient with non-ionic low-osmolar iodinated contrast material (dose: 1.5 mL/kg), craniocaudal (CC) and mediolateral oblique (MLO) views are obtained. Each view comprises two exposures, one with low energy (peak kilo-voltage values ranging from 26 to 31kVp) and one with high energy (45 to 49 kVp). A complete examination is carried out in about 5–6 minutes.

Description of dataset

The dataset is a collection of low-energy images with their corresponding subtracted CESM images gathered from the Radiology Department of the National Cancer Institute, Cairo University, Egypt over the period from January 2019 to February 2021. The images are all high resolution with an average of 2355 × 1315 pixels. Institutional review board approval and patient informed consent to carry out and publish data were obtained from 326 female patients aged from 18 to 90 years. The dataset contains 2006 images with CC and MLO views (1003 low energy images and 1003 subtracted CESM images), samples of low energy and subtracted CESM images are shown in Fig. 3. Usually, each patient has a total of 8 images, 4 images for each breast side consisting of low energy and subtracted CESM images for each CC and MLO view. However, there are 46 patients with only 4 images as they had mastectomy on a breast side, and 87 patients with missing images as some were not available or removed due to quality concerns. Two different machines were used for image acquisition; GE Healthcare Senographe DS and Hologic Selenia Dimensions Mammography Systems. The two machines provide similar quality, and all other steps in the data acquisition and post-processing phases were kept the same. The images are manually-annotated by expert radiologists according to the American College of Radiology Breast Imaging Reporting and Data System (ACR BIRADS) 2013 lexicon for standardized descriptors13. The annotations, shown in Table 1, include breast composition, mass shape, mass margin, mass density, architectural distortion, asymmetries, calcification type, calcification distribution, mass enhancement pattern, non-mass enhancement pattern, non-mass enhancement distribution, and overall BIRADS assessment (1 to 6). Both follow-up and pathological results are also included in the annotations, as pathological results are the gold-standard reference for radiologically-suspicious or malignant-looking lesions, and follow-up is the gold standard for benign-looking lesions. Moreover, full medical reports, written by an ensemble of radiologists, are provided for each case along with manual segmentation annotation for the abnormal findings in each image.

Fig. 3
figure 3

Samples of low energy and subtracted CESM images from the dataset.

Table 1 Descriptions of the annotations available for the dataset.

Annotations

Data are gathered and stored in a DICOM format. Some irrelevant annotations that are not used for lesion identification and classification were removed, including the patient’s name, ID, date of the study, and the image series. Each image with its corresponding annotation was compiled into one comma-separated-value (CSV) file.

Medical reports

Separate corresponding reports for the CESM images and the DM images are also included in the dataset. Each report consists of the findings, depicted for each breast side separately, written following the ACR BIRADS 2013 lexicon for standardized descriptors and reporting associated with the BIRADS category annotated for the case. All patients’ identification data were removed. We believe that releasing the full-text medical reports is important, as research studies concerned with radiology report-writing often struggle with the lack of full reports not being present in large datasets14.

Image processing

DICOM images were exported losslessly to a joint photographic experts group (JPEG) format using RadiAnt DICOM viewer application(https://www.radiantviewer.com/). After automatically removing all irrelevant data from each image, around 30% of the images were manually cropped to eliminate all unused and irrelevant boundaries. Furthermore, the images are named as follows {patient number}_{breast side}_{image type}_{image view}; example ‘P1_L_CM_MLO’.

Segmentation visual model

In this section, we describe our method to automatically segment the abnormal parts of the images. A deep learning model, EfficientNetB0, was trained to predict the overall diagnosis (Normal, Benign, Malignant). GradCam15 was used to generate highlights for the parts of the image that contributed to the model’s prediction. A threshold of the top 25% GradCam intensities is then used on the highlights to generate the segments. Furthermore, a threshold of the top 15% white pixels is used to further finetune the segmentations.

Preprocessing

The images were first resized to be 224 × 224 using interpolation and anti-aliasing. Then the images were normalized by subtracting from the mean and dividing by the standard deviation. Random image augmentations were also used like cropping, zooming, and horizontal flipping. Furthermore, we experimented with non-traditional data augmentation methods16 which uses generative adversarial networks (GANs) to generate new images. However, the generated images did not satisfy the experts, so only traditional data augmentations were used.

Model & training

An EfficientNetB017, pre-trained on ImageNet18, was used as the starting model in our experiments. We finetuned the model by removing the final layer and adding a layer with three output classes (Normal, Benign, Malignant). All the weights are left to be fine-tuned during the training. Categorical cross-entropy was used as the loss function with Adam optimizer19 as shown in Eq. 1, where CE(b) is the cross entropy loss for batch b, C the number of classes, N the number of images in the batch, y is the ground-truth, and \(\widehat{y}\) is the prediction. A batch size of 16 was used, a decaying learning rate of 1e-3, and a dropout layer20 with a drop probability of 0.8 on the final visual features was used before the classifier.

$$CE(b)=-\mathop{\sum }\limits_{c=1}^{C}\mathop{\sum }\limits_{i=1}^{N}{y}_{{i}_{c}}.log\;{\widehat{y}}_{{i}_{c}}$$
(1)

Highlights

After the model achieved a good accuracy on all the images, we used GradCam15 to get heatmaps representing the parts of the image that had the highest impact on the model’s decision. The heatmaps are traced back from the ground-truth class and not the predicted class. Moreover, we removed any highlights in the corners of the image as they are often present at the location of normal pectoral muscles.

Segmentation

To get the actual pixel segmentation, we used the top 25% of the heatmap’s intensities to serve as the abnormal segment. Moreover, to finetune the segments on the exact abnormality, we used the intersection of the segments and the top 15% white pixel intensities of the image as shown in Fig. 4.

Fig. 4
figure 4

(a) Example of the DL Gradcam highlights, (b) Segmentation calculated after applying a threshold on the highlights, (c) Final output after applying the white pixel intensity threshold, and (d) Hand-drawn segmentation annotation.

Data Records

The low energy and subtracted CESM images are distributed as JPEG files. They include both MLO and CC views of the mammograms.

Metadata for each image is incorporated as an associated CSV file consisting of:

  • Path to image files

  • Patient number

  • Breast side: Left or Right

  • Type of Examination: DM (low energy image) or CESM (subtracted image)

  • View: CC or MLO

  • Density category (if low energy image)

  • Number of findings (if multiple)

  • Mass shape, density, and margin (if present)

  • Mass enhancement pattern (if present)

  • Architectural distortion (if present)

  • Asymmetry (if present)

  • Calcification type and distribution (if present)

  • Non-mass enhancement pattern and distribution (if present)

  • BI-RADS assessment

  • Pathology: Benign or Malignant

Figure 5 shows histograms of BIRADS category and the corresponding final pathology/follow up result. Table 2 displays the characteristics of the CDD-CESM dataset.

Fig. 5
figure 5

Histograms for the CDD-CESM dataset showing distribution of (a) BIRADS category for each abnormality, (b) Benign and malignant lesions.

Table 2 Characteristics of the CDD-CESM dataset.

The CDD-CESM dataset is available21 on The Cancer Imaging Archive repository22. The dataset includes all images, annotations, and full medical reports.

Technical Validation

For the segmentation evaluation of our DL model, experienced radiologist provided hand-drawn segmentations for each abnormal finding in the CDD-CESM dataset. We calculated the intersection over union (IOU) and the dice coefficients (F1) between the computed and hand-drawn segmentations, after applying the same white-intensity threshold on the hand-drawn segmentations. Furthermore, we added another metric which we called overlap50, which is the percentage of images where the automatic segmentation overlaps with at least 50% of hand-drawn segmentation. The average IOU was 64.2% overall, overlap50 was 83.3%, and the average F1 was 71% overall. We also calculated these metrics separately for different groups of images according to the following criteria:

Different findings represented in the dataset

Mass enhancement had the highest overlap50 = 91%. Furthermore, postoperative cases had the lowest overlap50 = 77%. This might be attributed to post operative edematous changes and skin thickening that are not accurately or completely observed by our DL model.

Age of patient

Patients aged seventy years and higher had the highest overlap50 = 94%. Forty years and lower had the lowest overlap50 = 78%. As expected, the accuracy of visualization decreases as the breast density increases.

Low energy or subtracted image

Low energy image overlap50 = 81%, compared to 86% in subtracted images. This might be due to the dense adenotic tissue in low-energy images obscuring abnormalities found behind it, which are suppressed in subtracted images. Thus, we recommend that radiologists use both low energy and subtracted images for each patient in each view, to increase reliability of using our DL technique in drawing their final conclusions.

Mediolateral or Craniocaudal view

We found the results to be comparable without much difference in terms of automatic segmentation output.

Benign or malignant finding

Benign findings had the lower overlap50 = 75% compared to 90% for malignant findings. Most of the benign lesions were non-enhancing in subtracted images. Furthermore, in low-energy images, benign lesions were either hidden behind the dense breast tissues, had equal density or parallel orientation to the surrounding breast parenchyma. However, highly cellular benign findings were accurately depicted by our DL model. Decreased accuracy was found with multiplicity and retroareolar locations.

Generally, decreased accuracy of detection by our DL model was also present in some subtracted images with halo (breast-within-breast) or ripple artifacts. These calculations are shown in Table 3, and example outputs from our DL model are showed in Fig. 6.

Table 3 Detailed results of our DL segmentaion model.
Fig. 6
figure 6

Examples of different cases and their corresponding automatic segmentations.

Usage Notes

The dataset can be used to train machine learning models to classify mammogram images into normal, benign, and malignant, or classify the tags associated with each image. Moreover, it can be used to train segmentation models to segment the lesions. Furthermore, the full-text medical reports can be used to train report generation models.