Curated benchmark dataset for ultrasound based breast lesion analysis

A new detailed dataset of breast ultrasound scans (BrEaST) containing images of benign and malignant lesions as well as normal tissue examples, is presented. The dataset consists of 256 breast scans collected from 256 patients. Each scan was manually annotated and labeled by a radiologist experienced in breast ultrasound examination. In particular, each tumor was identified in the image using a freehand annotation and labeled according to BIRADS features and lexicon. The histopathological classification of the tumor was also provided for patients who underwent a biopsy. The BrEaST dataset is the first breast ultrasound dataset containing patient-level labels, image-level annotations, and tumor-level labels with all cases confirmed by follow-up care or core needle biopsy result. To enable research into breast disease detection, tumor segmentation and classification, the BrEaST dataset is made publicly available with the CC-BY 4.0 license.


Background & Summary
Breast cancer is the most commonly diagnosed cancer in women worldwide accounting for over 2.2 million new cases and resulting in over 650000 deaths in 2020 1 .In breast examination, ultrasound, mammography and magnetic resonance imaging are the most prevalent imaging modalities.Among them, ultrasound examination is gaining affordability and wide availability.However, it is also a highly operator-dependent modality, and depending on the breast structure or tumor type, the difficulty of spotting critical findings is varying 2 .Therefore, a reliable breast ultrasound examination requires a radiologist experienced in breast diagnostic imaging following the BI-RADS guidelines of the American College of Radiology (ACR) 3 .Although the atlas describes the signs of different breast abnormalities, interobserver agreement and intraobserver repeatability in breast assessment have been reported as poor 4 or at most moderate 5 .To address the issue, data-driven decision systems should be developed to support radiologists' diagnoses.
Machine learning models have been developed for different clinical applications in breast examinations, such as automatic cancer detection 6 , segmentation 7,8 and classification into malignant and benign breast tumors 9,10 .High-quality data is a key element for selecting features, developing theoretical models and augmented inference methods 11 .The dataset quality and reliability are particularly important in healthcare fields, where inaccuracies can lead to image misinterpretation and retard correct diagnosis 12 .Furthermore, models often underperform when they are tested on datasets collected using different devices at different sites due to domain shift 13 .It can be caused by differences in the set-up of the ultrasound machines or algorithms used for image enhancement.The process of building a dataset that satisfies the requirements is costly and time-consuming due to some constraints: (1) scans saved in the hospital's Picture Archiving and Communication System (PACS) are non-anonymized what makes them hard to access; (2) manual annotation by an experienced radiologist is expensive; and (3) there is no efficient system for storing, labeling and annotating medical image sets.
Six breast ultrasound datasets such as Open Access Series of Breast Ultrasonic Data 14 , Breast Ultrasound Lesions Dataset 15 , Medical Image Database 16 , Breast Ultrasound Videos 17 , Breast Ultrasound Dataset 18 , and Breast Ultrasound Images Database 19 have been published in recent years.One of them 17 consists of video frames with rectangular bounding boxes, the lack of manual annotations excluded it from further consideration.The largest one 18 contains 780 images, but unfortunately, more than 40% have significant defects: duplicated images, sometimes classified differently, axilla images instead of breast images, presence of measurement markers or Color Doppler region of interest in the image, etc. 20 .All other datasets are smaller in size but they also contain images limiting their utility.Of the five datasets, only one includes annotations of multiple tumors in the image and can be used for the detection and segmentation of abnormalities.Furthermore, only two datasets were labeled with a diagnosis (with three diagnoses 15 and with eight diagnoses 16 ), while none of them associate a diagnosis with labels of critical findings.A summary of the published datasets and the dataset 21 presented here is provided in Table 1.As already published datasets are not detailed enough and a benchmark reliable dataset has not yet been published, the field of breast ultrasound datasets remains unexploited.
In this paper, we present an expert-annotated dataset 21 of 256 ultrasound images of the breast.The dataset consists of images of 154 benign tumors, 98 malignancies and 4 normal breasts.To provide generality to the dataset, images were collected by five radiologists at medical centers in Poland in 2019-2022.All images were manually annotated and labeled by radiologists via a purpose-built cloud-based system.The dataset contains patient-level labels, image-level annotations, and tumor-level labels with all tumors confirmed by follow-up care or biopsy result.In particular, the first stage of data collection considered clinical data as patient-level labels, i.e. age, breast  Compared to the publicly available datasets (Table 1), the BrEaST dataset includes annotations of multi-lesion images, core needle biopsy results and is labeled for BIRADS features to support the BIRADS category.

Methods
The breast ultrasound images from 256 patients were collected at medical centers in Poland in 2019-2022.Ethical approval for this study was obtained from the Bioethics Committee at the Lower Silesian Chamber of Medicine no.2/BNR/2022.The requirement of obtaining written informed consent from patients was forgone because retrospective data collection has not impacted the standard diagnostic procedures and all data has been anonymized before being entered into the database.The data transfer, annotation and labeling were conducted via a purpose-built web-based system at the Institute of Fundamental Technological Research of the Polish Academy of Sciences in Poland.The scheme of workflow is shown in Fig. 1.In particular, the process was divided into four steps: data acquisition (1), data collection and anonymization (2), data labeling and annotation (3) and data evaluation and extraction (4).

Data acquisition.
The data acquisition was performed by five radiologists/oncologists working at medical centers in Poland: the Breast Unit, Lower Silesian Oncology, Pulmonology and Hematology Center and Maria  The breast ultrasound examination was conducted in accordance with the BI-RADS guidelines of the ACR.The ultrasound device settings (e.g.transmit frequency or gain) were individually chosen for the patient to obtain a tumor image appropriate for interpretation.In case of suspicion of malignancy, a core needle biopsy was performed.

Data collection and anonymization.
In building the dataset, the first clinically non-standard step was to anonymize, collect and transfer the data.To protect patients' privacy, all identifiable information has been removed from the images.
The anonymization was conducted at the institutions of the data origin.For each file, all DICOM tags containing sensitive or identifiable information such as patient ID, patient's name, or patient's date of birth were deleted or replaced with random values.Then, all patient-related textual information within the image (e.g.patient ID) was permanently removed.Before transferring, all anonymized images were manually reviewed to ensure that all information had been removed.
Of the DICOM tags, only those image-related (i.e.Width, Height, Bit Depth, Samples Per Pixel, Photometric Interpretation, Bits Allocated, Bits Stored, High Bit, Pixel Representation, Derivation Description, Pixel Data and Sequence Of Ultrasound Regions -Region Location Min X0, Region Location Min Y0, Region Location Max X1, Region Location Max Y1, Physical Units X Direction, Physical Units Y Direction, Physical Delta X, Physical Delta Y) were preserved, as they are necessary for the proper displaying of the image and its subsequent analysis.To facilitate this workflow, we designed and created a purpose-built web-based platform for collecting, annotating, and labeling breast ultrasound images.
Data labels and annotation.During data acquisition, patient clinical data were collected, such as age, breast tissue composition, signs, and symptoms.They were paired with the image during data uploading.The list of labels for signs/symptoms consisted of the most prevalent observed abnormalities/reported experiences.Labels of tissue composition are in accordance with BI-RADS guidelines 3 .
Next, the radiologist, who collected the data, indicated the regions of interest using freehand annotations.In segmentation, two tissue classes were considered: (1) the tumor mask which outlined the mass under examination, and (2) the other object mask which was optional and concerned other suspicious lesions in the image (e.g.cyst).The boundary of the segmented lesion is in line with the measurement markers of the lesion size used during standard ultrasound scanning.For normal cases, the masks are not available due to the lack of abnormal findings.The example of image segmentation with two classes of masks is shown in Fig. 2, yellow represents the tumor class, and blue -the other object class.For annotation, each radiologist chose a tablet with a pencil or a computer with a mouse depending on their preference.
Image annotating was followed by labeling according to BI-RADS reporting guidelines 3 .Seven B-mode-based features were included.They were divided into mass-oriented features (shape, margin, echogenicity, posterior features, halo) and image-oriented features (calcifications, skin thickening).Labels of tumor orientation (parallel/not parallel) were excluded because their direct interpretation is provided in the tumor mask.For normal cases, six features are not applicable, so only skin thickening is considered.This BI-RADS reporting section was ended by assigning one of seven categories (BI-RADS 1, 2, 3, 4a, 4b, 4c, 5).In addition, each tumor was labeled for 15 image interpretations that reflect the radiologist's overall diagnostic impression.This list of interpretations consisted of the most prevalent diseases that are differentiated in clinical practice.

Continued
Data evaluation and extraction.Finally, all collected data were evaluated in terms of preparation for export.Inclusion criteria were defined as female patients with tumor type confirmed by pathological diagnosis or over 2-year follow-up care.Moreover, the final dataset includes only B-mode breast images with tumors not exceeding image size, without measurement markers, pictograms, artifacts, and text annotations.
All images have been cropped to remove text annotations with device settings on the image sides.In non-rectangular images (from extended field-of-view imaging), the black background in the frame has been changed to transparent (see the image of BIRADS 2 in Table 2) to allow analyses requiring data limited to the image itself.Ignoring the alpha channel, the background remains black.
Each of the five radiologists contributed equally to the final dataset.The final extraction of the dataset is de-identified in radiologist ID terms, the case is no longer associated with the radiologist.As a result, the radiologist-medical center-patient linkage is removed, so the patients' identities cannot be reasonably determined from the provided data.

Data Records
The BrEaST dataset has been made available for download at The Cancer Imaging Archive (TCIA) 21 and for viewing on the dedicated webpage 24 .Additionally, thumbnail preview of all images and lesion masks is included in supplementary material attached to this paper.

Data characteristics.
The data were acquired from 256 adult female patients between 18 and 87 years old at the examination time.A total of 197 biopsies were performed (accounting for 77% of the dataset), confirming 98 breast cancers.The biopsy results available for BIRADS 3 (12 such cases) are diagnoses made prior to follow-up ultrasound scanning (during which the images were acquired).Conclusively, the dataset consists of 98 cancers, 154 benign lesions and 4 normal tissue images.The number of images of these classes for all BIRADS categories is shown in Fig. 3.
Examples of images from each BIRADS category with overlying annotations defining the tumor area (if applicable) are shown in Table 2.The selected images show the diversity of the released dataset.The image of BIRADS 1 shows normal breast tissue during lactation with clearly widened milk ducts.The BIRADS 2 image shows the lymphocele in the post-breast-conserving therapy setting.The image was acquired in extended field-of-view mode, so transparency is added to its sides.The image of BIRADS 3 shows the tumor above the silicone implant.The images of BIRADS 4a, 4b, 4c,and 5 show lobular carcinoma in situ, mastitis, invasive carcinoma of no special type, and invasive carcinoma of no special type with foci of sebaceous carcinoma, respectively.
The data characteristics, including all considered attributes with their definition and the prevalence of each label, are summarized in Tables 3, 4. It should be emphasized that the Diagnosis column (of Table 4) contains values "not applicable" due to the absence of a histopathological result for cases of BIRADS categories 1,2,3 (for BIRADS 4a, 4b, 4c, 5, the column is completely filled in).In the absence of a histopathological diagnosis, the Interpretation column showing the radiologist's overall diagnostic impression should be used (also completely filled in for BIRADS 2-5, not applicable for BIRADS 1).
Dataset structure.The downloaded files are (1) a .zipfile containing a folder with images and masks, and separately (2) a.xlsx file with labels.

Continued technical Validation
The quality of the BrEaST dataset 21 was prompted by controlling each stage of data processing and analysis.The validation process was divided into three parts: (1) regular validation of the dataset performed during the dataset development, (2) validation of annotations to check their usability in analyses, (3) simple analysis to validate the association of annotations with labels.

Regular validation.
The web-based purpose-built system had the fully controlled workflow and allowed radiologists to validate each stage of data processing and to report errors on an ongoing basis (e.g.improper anonymization of data).Furthermore, error handling was implemented as part of the annotation and validation framework to prevent mechanical errors (e.g., skipping the BI-RADS category when tumor annotations were completed).Finally, submitting the form with labels and annotations required double confirmation to deter accidental clicks and ensure that blank fields were unavailable information and not omissions.• Apocrine carcinoma (n = 1)
After the dataset was collected and fully described, it was manually checked by a database manager and then cross-checked by another expert.The final step was checking the dataset for duplicates, using a previously developed algorithm 20 .
Validation of tumor annotations.First, the process of annotation validation included checking whether each mask consisted of a single object.Binarization of the drawn contours sometimes set random pixels as belonging to an object (e.g.resulting from resting a hand on a tablet screen).Then, the height and width of the tumor were automatically determined based on the masks according to the ACR guidelines 3 .The derived measurements (and corresponding masks) were verified by each radiologist.For all the masks discussed, the original ones were kept as the radiologist who performed the examination had the greatest knowledge of the lesion in question.The resulting 252 pairs of measurements are presented in Fig. 4.
The results obtained are in line with expectations.Markers of malignant tumors are clearly clustered over those of benign ones.This observation is reflected in the BI-RADS feature, i.e. in orientation, where non-parallel property (vertical dimension is greater than horizontal dimension) is a predictor of malignancy.The longest diameter was also taken into account (the distribution of its values is shown in grayscale background in Fig. 4), as it is a clinically used measurement in assessment of the tumor response to treatment (also used in ultrasound 25 ) and is also evaluated in Tumor Node Metastasis (TNM) staging 26 .
Moreover, the higher density of points can be seen at small tumor sizes.Considering the thresholds from the TNM staging, the size distribution is as follows 10 cases within T1a stage, 76 within T1b, 107 within T1c, 57 within T2, and 2 within T3.Therefore, 77% (n = 193) of the determined tumor dimensions is classified as T1.The skewness (equals to 1.44) of the dataset toward smaller tumors enables the development of methods to detect them at an earlier stage.Additionally, earlier diagnosis of carcinomas is crucial for effective treatment.
Validation of annotations and labels.An example of quantitative analysis is the assessment of tumor shape based on the masks included in the collection.One of the primary methods for evaluating shape roughness and complexity is the turning angle function (TAF) 27 of a contour, which simplifies the characterization of shapes and can be used as their signature.It is the cumulative function of turning angles, and it may be obtained by deriving the counterclockwise angle between the tangent at the segment of a contour and the x-axis, and expressing it as a function of the arc length of the segment.The perimeter of the lesions presented in Table 6 was smoothed by using a moving average based on 10 points for a clearer presentation of the TAF.

Usage Notes
The BrEaST dataset is available for download at TCIA 21 and for browsing (as an atlas of breast lesions along with histological diagnoses) on the dedicated webpage 24 .It was created to develop and evaluate algorithms for detecting, segmenting, and classifying abnormalities in breast ultrasound scans.Applications of the BrEaST dataset may include: • Training and testing models for localizing lesions in images (available masks for multi-tumor images, noncropped images with skin layer and no visible markers); • Training and testing models for segmenting lesions in images (provided masks created manually by experienced radiologists); • Training and testing models for classifying lesions in images (available BI-RADS category and classification into benign/malignant); • Testing methods using the dataset as a benchmark what can increase the interpretability of the models' performance by filtering or grouping labels (available e.g.BIRADS features, diagnoses, interpretations, signs and symptoms).The released dataset has some limitations that need to be addressed in the future, including: • The number of cases for some diagnoses is limited due to their rare prevalence in the population (e.g., invasive papillary carcinoma or sebaceous carcinoma).Therefore, training machine learning algorithms on the BrEaST dataset to diagnose rare diseases may be unbalanced.Albeit it is useful information that, added to the benign/malignant labels, expands the field of research.For example, it enables grouping of tumors by invasiveness (non-invasive vs. pre-invasive vs. invasive lesions) to enhance the interpretability of the lesion classification model (e.g., misclassification of pre-invasive lesions).• Only few normal cases (no lesion present) are included in the database for models evaluation, but these cases can be supplemented from other sources (Table 1).
tissue composition, signs and symptoms.The second part was adding image-level freehand annotation identifying the tumor and other abnormal areas in the image.Then, the tumor and image were labeled with BIRADS category, BIRADS descriptors, and interpretation of critical findings as presence of breast diseases.The final labels regarded the method of verification, tumor classification and histopathological diagnosis (33 diagnoses).

Fig. 1
Fig. 1 Overview of the dataset workflow: (1) images and clinical data acquisition at medical centers; (2) anonymization, transfer and insertion into the database; then (3) preparation of manual (freehand) annotations and labels; finally, (4) evaluation and export as the BrEaST dataset.

Fig. 2 5 Table 2 .
Fig. 2 An example of a breast lesion image (a) and its segmentation into two classes (b), tumor area (marked in yellow) and areas of other abnormalities (marked in blue).

Fig. 3
Fig. 3 Distribution of normal, benign and malignant cases for all BI-RADS categories.

Fig. 4 Table 6 .
Fig.4 The height and width of the tumors determined from masks drawn by radiologists.The grayscale background shows the probability density function computed from the maximum diameter.

Table 1 .
An overview of publicly available breast ultrasound datasets and the dataset presented in this work.

Table 3 .
The dataset characteristics for clinical patient attributes and BI-RADS descriptors.

Table 5 .
Examples of two rows (for better readability shown as columns) from the .csvfile describing the dataset.The '&' sign separates multiple elements in a field.