Background & Summary

Laparoscopic surgery is a commonly used technique that facilitates minimally-invasive surgical procedures as well as robot-assisted surgery and entails several advantages over open surgery: reduced length of hospital stay, less blood loss, more rapid recovery, better surgical vision and, especially for robotic procedures, more intuitive and precise control of surgical instruments1,2. Meanwhile, a lot of the information in the image is not used, because the human attention is not able to process this immense amount of information in real time. Moreover, anatomical knowledge and medical experience are required to interpret the images. This barrier represents a promising starting point for the development of Artificial Intelligence (AI)-based computer-based assistance functions.

The rapidly developing methods and techniques provided by the usage of AI, more precisely the automated recognition of instruments, organs and other anatomical structures in laparoscopic images or videos, have the potential to make surgical procedures safer and less time-consuming3,4,5,6. Open-source laparoscopic image datasets are limited, and existing datasets such as Cholec807, LapGyn48, SurgAI9 or the Heidelberg Colorectal Data Set10 mostly comprise image-level annotations that allow the user to differentiate whether or not the structure of interest is shown in an image without giving information about its specific spatial location and appearance. However, such pixel-wise annotations are required for a variety of machine learning tasks for image recognition in the context of surgical data science11. In a clinical setting, such algorithms could facilitate context-dependent recognition and thereby protection of vulnerable anatomical structures, ultimately aiming at increased surgical safety and prevention of complications.

One major bottleneck in the development and clinical application of such AI-based assistance functions is the availability of annotated laparoscopic image data. To meet this challenge, we provide semantic segmentations that provide information about the position of a specific structure by annotations of each pixel of an image. Based on video data from 32 robot-assisted rectal resections or extirpations, this dataset offers a total amount of 13195 extensively annotated laparoscopic images displaying different intraabdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands) and anatomical structures (abdominal wall, inferior mesenteric artery, intestinal veins). For a realistic representation of common laparoscopic obstacles, it features various levels of organ visibility including small or partly covered organ parts, motion artefacts, inhomogeneous lighting and smoke or blood in the field of view. Additionally, the dataset contains weak labels of organ visibility for each individual image.

Adding anatomical knowledge to laparoscopic data, this dataset bridges a major gap in the field of surgical data science and is intended to serve as a basis for a variety of machine learning tasks in the context of image recognition-based surgical assistance functions. Potential applications include the development of smart assistance systems through automated segmentation tasks, the establishment of unsupervised learning methods, or registration of preoperative imaging data (e.g. CT, MRI) with laparoscopic images for surgical navigation.

Methods

This dataset comprises annotations of eleven major abdominal anatomical structures: abdominal wall, colon, intestinal vessels (inferior mesenteric artery and inferior mesenteric vein with their subsidiary vessels), liver, pancreas, small intestine, spleen, stomach, ureter and vesicular glands.

Video recording

Between February 2019 and February 2021, video data from a total of 32 robot-assisted anterior rectal resections or rectal extirpations performed at the University Hospital Carl Gustav Carus Dresden was gathered and contributed to this dataset. The majority of patients (26/32) were male, the overall average age was 63 years and the mean body mass index (BMI) was 26.75 kg/m2 (Table 1). All included patients had a clinical indication for the surgical procedure. Surgeries were performed using a standard Da Vinci® Xi/X Endoscope with Camera (8 mm diameter, 30° angle, Intuitive Surgical, Item code 470057) and recorded using the CAST-System (Orpheus Medical GmbH, Frankfurt a.M., Germany). Each record was saved at a resolution of 1920 × 1080 pixels in MPEG-4 format and lasts between about two and ten hours. The local Institutional Review Board (ethics committee at the Technical University Dresden) reviewed and approved this study (approval number: BO-EK-137042018). The trial, for which this dataset was acquired, was registered on clinicaltrials.gov (trial registration ID: NCT05268432). Written informed consent to laparoscopic image data acquisition, data annotation, data analysis, and anonymized data publication was obtained from all participants. Before publication, all data was anonymized according to the general data protection regulation of the European Union.

Table 1 Patient characteristics.

Temporal annotation of videos

The surgical process was temporally annotated by one medical student with two years of experience in robot-assisted rectal surgery (MC, FMR) using b<>com *Surgery Workflow Toolbox* [Annotate] version 2.2.0 (b<>com, Cesson-Sévigné, France), either during the surgery or retrospectively, according to a previously created annotation protocol (Supplementary File 1), paying particular interest to the visibility of the abovementioned anatomical structures. Ubiquitous organs (abdominal wall, colon and small intestine), intestinal vessels, and vesicular glands were not specifically annotated temporally.

Frame extraction

To achieve a highly diverse dataset, videos from at least 20 different surgeries were considered for each anatomical structure. From each considered surgical video, up to 100 equidistant frames were randomly selected from the total amount of video data displaying a specific organ. As a result, this dataset contains at least 1000 annotated images from at least 20 different patients for each organ or anatomical structure. The number of images extracted and annotated per organ and surgery as well as the number of segments and the mean proportions of non-segmented background per organ are listed in Table 2.

Table 2 Distribution of images in the dataset per organ and surgery.

For anatomical structures without a temporal annotation (abdominal wall, colon, intestinal vessels, small intestine and vesicular glands), sequences displaying the specific organ were selected and merged manually using LossLessCut version 3.20.1 (developed by Mikael Finstad). Random frames were extracted from the merged video file using a Python script (see section “Code availability”). The extraction rate (extracted frames per second) was adjusted depending on the duration of the merged video to extract up to 100 images per organ per surgery. Images were stored in PNG format at a resolution of 1920 × 1080 pixels.

For liver, pancreas, spleen, stomach and ureter, temporal annotations served as a basis for the frame-extraction process using the abovementioned Python script. Based on a TXT file with temporal annotations of organ presence, equidistant frames were extracted from respective sequences for each organ as outlined above.

The resulting frames were audited and images that were not usable (e.g. the organ is not visible because it is concealed completely by an instrument, the complete field of view is filled with smoke, severely limited visibility due to a blurred camera) were excluded manually.

No automated filtering processes were applied to specifically select or avoid images (e.g. based on mutual information). To maintain the variability inherent to intraoperative imaging, no image preprocessing steps such as adaptation of image intensity or contrast, or window size) were performed. Images were directly extracted from the videos recorded during surgery, converted into PNG (lossless). These images were then directly annotated.

The resulting dataset includes over 1000 images from at least 20 surgeries for each anatomical structure (Fig. 1).

Fig. 1
figure 1

Overview of the data acquisition and validation process. Based on temporal annotations of 32 rectal resections, three independent annotators semantically segmented every single image with regard to the pixel-wise location of the respective organ. These segmentations were merged and individual segmentations were reviewed alongside the merged segmentation by a physician with considerable experience in minimally-invasive surgery, resulting in the final pixel-wise segmentation (left panel). Moreover, every single image was classified with regard to the visibility of all individual anatomical structures of interest by one annotator and independently reviewed (right panel).

Semantic segmentation

For pixel-wise segmentation, we used 3D Slicer 4.11.20200930 (https://www.slicer.org) including the SlicerRT extension, an open-source medical image analysis software12. The anatomical structures were manually semantically segmented with the Segment Editor function using a stylus guided tablet computer running Microsoft Windows. The settings made during segmentation were “scissors”, operation “fill inside”, shape “free form”, slice cut “symmetric”. As a guideline we generated a segmentation protocol that describes inclusion criteria for each considered anatomical structure in detail (Supplementary File 2). Each individual image was semantically annotated according to this guideline by three medical students with basic experience in minimally-invasive surgery. Thus, exactly one specific anatomical structure was finally segmented in each image (e.g. the colon was pixel-wise annotated in each of the 1374 colon images). In addition, one multi-organ-segmentation dataset was created out of the 1430 stomach frames. The stomach dataset was chosen for this purpose because these images very often show various organs, such as the colon, small intestine or spleen as well as the abdominal wall. Subsequently, the three individual annotations were automatically merged (see section “Code availability”). Individual annotations alongside merged segments were reviewed and adjusted by a physician with three years of experience in minimally-invasive surgery. Figure 1 gives an overview over the image generation and verification process. Example annotations are provided in Fig. 2.

Fig. 2
figure 2

Sample images of each anatomical structure. The figure displays a raw image (left column), the three pixel-wise annotations and the merged annotation (middle column) as well as the final reviewed segmentation (right column). The three annotations are shown as red, green and blue lines. The merged version and the final reviewed segmentation are displayed as white transparent surfaces.

Weak labeling

Weak labels provide information about the visibility of different anatomical structures in the entire image. Weak labels were annotated by one medical student with basic experience in minimally-invasive surgery and reviewed by a second one in each frame (Fig. 1).

The complete dataset is accessible at figshare13.

Data Records

The Dresden Surgical Anatomy Dataset is stored at figshare13. Users can access the dataset without prior registration. The data is organized in a 3-level folder structure. The first level is composed of twelve subfolders, one for each organ/anatomical structure (abdominal_wall, colon, inferior_mesenteric_artery, intestinal_veins, liver, pancreas, small_intestine, spleen, stomach, ureter and vesicular_glands) and one for the multi-organ dataset (multilabel).

Each folder contains 20 to 23 subfolders for the different surgeries that the images have been extracted from. The subfolder nomenclature is derived from the individual index number of each surgery. Each of these folders contains two versions of 5 to 91 PNG-files, one raw image that has been extracted from the surgery video file and one image that contains the mask of the expert-reviewed semantic segmentation (black = background, white = segmentation). The raw images are named imagenumber.png, (e.g. image23.png), the masks are named masknumber.png (e.g. mask23.png). In the multilabel folder there are separate masks for each of the considered structures visible on the individual image (e.g. masknumber_stomach.png). The image indices always match for associated images.

Each surgery- and organ-specific folder furthermore contains a CSV file named weak_labels.csv that contains all information about the visibility of the eleven regarded organs in the respective images. The columns in these CSV files are ordered alphabetically: Abdominal wall, colon, inferior mesenteric artery, intestinal veins, liver, pancreas, small intestine, spleen, stomach, ureter and vesicular glands.

Additionally, the folders anno_1, anno_2, anno_3 and merged can be accessed from the surgery- and organ-specific subfolders. These folders contain the masks generated by the different annotators and the automatically generated merged version of the masks, each in PNG format.

Technical Validation

To merge the annotations of the three different annotations for each image in the dataset, the STAPLE algorithm14, which is commonly used for merging different segmentations in biomedical problems, was applied. Each annotator received the same weight. The merged annotations were then, together with the original segmentations of the annotators, uploaded to a segmentation and annotation platform called CVAT (https://github.com/openvinotoolkit/cvat)15 hosted at the National Center for Tumor Diseases (NCT/UCC) Dresden. The physician in charge of reviewing the data could then log-in, select the most appropriate annotations for each image and, if necessary, adjust them.

To evaluate the extent of agreement between the segmentations of the individual annotators and the merged annotation with the final annotation of each image, we computed two standard metrics for segmentation comparison16:

  • F1 score, which showcases the overlap of different annotation with a value of 0 to 1 (0: no overlap, 1: complete overlap)

  • Hausdorff distance, a distance metric, which calculates the maximum distance between a reference annotation and another segmentation. Here we have normalized the Hausdorff distance via the image diagonal, resulting in values between 0 and 1, which 0 indicates that there is no separation between the two segmentations and 1 meaning there is a maximum distance between the two.

The results of this comparison can be found in Table 3, sorted according to the different tissue types. The table shows that for most organs there is no large discrepancy between the merged annotations and the final product, with most F1 scores being over 0.9 indicating a large overlap and the low value for the Hausdorff distance indicating that no tendencies for over or under-segmentation were present. Only the F1 score for the ureter class seems to indicate that the expert annotator had to regularly intervene, though the difference still seems to be minimal as indicated by the low Hausdorff distance.

Table 3 Comparison of the different annotators and their merged annotation to the final annotation.

Most annotators also seemed to regularly agree with the final annotation, though not always with the same degree as the merged annotation, justifying the fusion via STAPLE. Similar to the merged annotations, there were larger discrepancies in regard to the ureter class. Generally though, at least two annotators seemed to largely agree with the expert annotations.

Usage Notes

The provided dataset is publicly available for non-commercial usage under the Creative Commons Attribution CC-BY. If readers wish to use or reference this dataset, they should cite this paper.

The dataset can be used for various purposes in the field of machine learning. On the one hand, it can be used as a source of further image material in combination with other, already existing datasets. On the other hand, it can be used to create organ detection algorithms working either with weak labels or with semantic segmentation masks, for example as a basis for further development of assistance applications17. Proposed training-validation-test splits as well as results of detailed segmentation studies are reported in a separate publication 18.