Image-based algorithmic software segmentation is an increasingly important topic in many medical fields. Algorithmic segmentation is used for medical three-dimensional visualization, diagnosis or treatment support, especially in complex medical cases. However, accessible medical databases are limited, and valid medical ground truth databases for the evaluation of algorithms are rare and usually comprise only a few images. Inaccuracy or invalidity of medical ground truth data and image-based artefacts also limit the creation of such databases, which is especially relevant for CT data sets of the maxillomandibular complex. This contribution provides a unique and accessible data set of the complete mandible, including 20 valid ground truth segmentation models originating from 10 CT scans from clinical practice without artefacts or faulty slices. From each CT scan, two 3D ground truth models were created by clinical experts through independent manual slice-by-slice segmentation, and the models were statistically compared to prove their validity. These data could be used to conduct serial image studies of the human mandible, evaluating segmentation algorithms and developing adequate image tools.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & summary
Software programs for processing medical image data have become a central tool in many medical disciplines1 and are subject to continuous ongoing technological developments. This is especially true because the total number of data sets has increased with the wider application of image-processing techniques in most clinical centres, the improving resolution of new image scanner generations has allowed medical image data sets to more accurately record anatomy, and the time required to computationally reconstruct image data has been reduced2. Clinically, the produced data can be used for image-based 3D visualizations of complex anatomical structures to support medical diagnosis and/or therapeutic planning procedures3,4. In the field of craniomaxillofacial surgery, this approach is used for complex facial bone trauma, trauma-related or congenital craniomaxillofacial deformities and orthognathic surgery5,6.
Therefore, improvements in the software used for 3D reconstruction and volume rendering generated by image-based segmentation algorithms are needed7,8. Image-based segmentation methods are algorithmic software tools used for radiological image data. These image data mostly originate from computed tomography (CT) data sets9,
In the craniomaxillofacial (or head and neck) complex, automated and/or semi-automated segmentation techniques have previously been used for the lower jaw to investigate their use in clinical diagnosis and treatment support13,
For example, previously published works in this field include a CT data collection of lower jaws from sharks19 and a collection of CT data for the Reference Image Database to Evaluate Therapy Response (RIDER)20. However, accessible medical human CT libraries, especially clinically relevant ground truth databases (a gold standard database for segmentation programs/algorithms that is similar to a control group for image-based segmentation programs/algorithms to assess segmentation accuracy/quality), for the evaluation of image-based segmentation approaches or software processes are limited. In particular, fully segmented 3D ground truth models of complete human anatomical bone structures are lacking. Therefore, the authors of this work are currently investigating multiple image-based open-source software segmentation approaches for such ground truth models21. Such CT data set collections and ground truth segmentations of the same anatomical structure provide resources for conducting serial image studies of the human mandible, evaluating segmentation algorithms and developing adequate image tools to assess clinical or scientific use.
Valid ground truth models of complete anatomical structures are needed to evaluate the practical use of segmentation algorithms, which are used in many clinical centres for 3D visualization or in diagnosis and treatment support in complex medical cases for both clinical and research purposes. Furthermore, such ground truth models can be used in medical image segmentation challenges, such as the Multimodal Brain Tumor Segmentation Challenge (BraTS) (https://braintumorsegmentation.org/).
This contribution provides a description of valid fully segmented ground truth CT data sets of the complete anatomy of the mandibular bone without artefacts or missing slices (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5). The ground truth data were created by medical specialists and statistically tested for validation. These virtual 3D ground truth models can be applied to compare and analyse 3D visualizations, as well as the functional stability or accuracy of image-based segmentation algorithms in medicine. These uses may include investigations and testing of commercial and conventional segmentation approaches or other current and future image-based processing techniques.
Moreover, valid 3D ground truth data of complete anatomical structures provide a visual resource for outreach, education and/or further research purposes. This possibility represents a novel application of these data. For example, these 3D models can be used for the further development of segmentation approaches, for physical reproduction using rapid prototyping or for integration into 3D PDF documents as interactive figures. Moreover, the data collection represents an objectively created, valid and homogeneous sample collective that can be used as a representative control group in comparative assessments. This point is especially relevant since the ground truth data used for this collection originate from routine clinical practice.
Some data in this collection were used in part in previously published works17,21,
However, the data description presented in this manuscript has additional value because it includes complete data descriptions and data parameters, detailed results and the full data sets for a human mandibular CT library. In contrast to other works in which slices with artefacts were excluded from the segmentation process18, leading to incomplete data sets, the new value of the data description provided with this manuscript is that it includes complete segmented valid ground truth data and 3D masks of the mandible without artefacts or missing slices.
This study was approved by the ethics committee of the Medical University of Graz, Austria (EK-29-143 ex 16/17, Medical University of Graz, Austria), and informed consent was obtained from all subjects.
In this investigation, the human lower jaw was selected as the anatomical structure of choice because it is the largest and strongest bone in the craniomaxillofacial complex24. Furthermore, the lower jaw is of high clinical relevance because it has the highest occurrence of facial fractures in the craniomaxillofacial field, representing approximately 40% of such fractures, and is therefore often involved in trauma injury25. Clinically, the lower jaw is frequently involved in time-consuming trauma cases in which computer-assisted image-based segmentation algorithms are used to support diagnosis and treatment procedures13,
Forty-five original CT data sets of the craniomaxillofacial complex were provided as DICOM files and collected during routine clinical practice in the Department of Oral and Maxillofacial Surgery at the Medical University of Graz in Austria. Only high-resolution data sets (512 × 512) with slices not exceeding a spacing of 2.0 mm with a 0.5 × 0.5 mm2 pixel size that provided physiological, complete mandibular bone structures without teeth were included in the selection process17. Furthermore, no distinction was made between atrophic and non-atrophic mandibular bones; as this condition occurs physiologically, both types of bone were included during the selection process21.
However, incomplete data sets consisting of mandibular structures altered by iatrogenic or pathological factors or fractured mandibles, as well as data sets showing osteosynthesis materials in the lower jaw, were excluded from this trial. These data sets were excluded to avoid an inhomogeneous collection of data sets. To ensure the creation of high-quality ground truth data and a homogeneous ground truth model collection, only complete physiological data sets with clear bone contours and anatomical structures without artefacts were used, according to the mentioned inclusion criteria17.
All data sets were acquired within a twelve-month period (between 2013 and 2017). According to the inclusion criteria, 20 CT data sets were selected; 25 data sets were excluded during the selection process in routine clinical practice for diagnosis and treatment reasons. From these 20 CT data sets, 10 data sets (n = 10; 6 male, 4 female) were further selected to form an experimental segmentation group for 3D ground truth volume creation17,21,22. The CT scans were performed with a Siemens Somatom Sensation 64 medical scanner using a standardized scanning protocol and a B30f reconstruction filter. Table 1 describes the details and the scanning parameters and shows the physiological information for the selected data.
3D ground truth segmentation
The segmentation group ultimately consisted of 10 comprehensively chosen data sets. These image series were imported into the medical platform MeVisLab (https://www.mevislab.de/). To create the ground truth volumes, a manual slice-by-slice segmentation of the randomly selected lower jaw data sets was carried out in MeVisLab by two clinical experts, specialized maxillofacial surgeons A and B, who both had the same level of experience in segmentation. Additionally, one specialized radiologist supported the ground truth segmentation process in the background. More precisely, each data set was segmented manually by clinical experts A and B through the manual outlining of bone structures in each slice to create two independent ground truth schemes (ground truth A and B) for each data set17,21 (Fig. 1).
All selected CT data sets were completely anonymized and converted from DICOM files into single Nearly raw raster data (NRRD) files. Segmentation was performed manually slice by slice for the creation of the ground truth data (MeVisLab 2.5.2., software, MeVis Medical Solutions AG, Fraunhofer Institut, Bremen, Germany)26,
The data sets were compared by defined parameters, as follows: Ground truth A vs. Ground truth B.
After ground truth segmentation, the created 3D models were tested for validity. The accuracy and congruence of the segmented volumes were assessed to identify any systematic inaccuracy or observer-dependent subjectivity due to the manual creation of the ground truth data. This process was performed by a comparative assessment of the two ground truth models for each data set (n = 20) using defined parameters: 1) the Dice score coefficient (DSC)30,31, 2) the Hausdorff distance (HD)32, 3) the segmentation volume and 4) the number of voxels (voxel units). These parameters are commonly used objective standard sizes in the evaluation of various techniques for volume and image rendering28,29.
In more detail, the DSC is the agreement between two binary volumes and is calculated as follows. It measures the relative volume overlap between R1 and R2, where R1 and R2 are the binary masks from two segmentations. V(·) is the volume (e.g., in mm3) of voxels inside a binary mask, obtained by counting the number of voxels, then multiplying that value by the voxel size (Equation (1)).
The HD between two binary volumes is defined by the Euclidean distance between the boundary voxels of the masks. Given again the sets R1 and R2 of two segmentations that consist of the points that correspond to the centres of the segmentation mask boundary voxels in the two images, the directed HD h(A,R) is defined as the minimum Euclidean distance from any of the points in the first set to the second set, and the HD between the two sets H(A,R) is the maximum of these distances33 (Equation (2)).
Descriptive statistical calculations were used to summarize the measurements made from the 3D models, including minimum values, maximum values, mean values and standard deviations for each of the five parameters defined above. Analytical statistical methods consisted of the calculation of paired t-tests (p) to confirm statistical validity values, the calculation of Pearson’s product-moment correlation coefficient (r)34,
The data for this manuscript have been stored in a figshare repository (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5). The data sets are 10 completely anonymized CT scans of the human craniomaxillofacial complex that originally consisted of tomographic images (slices) in DICOM format from routine clinical practice. The data sets all include complete physiological mandibles, which are partly atrophic due to patient age and missing teeth but are not altered by any pathology. The corresponding mandible contours (mandibular ground truth segmentations) are provided within this data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5). To anonymize the data (visual de-identification), a block was laid over the eyes of the reconstructed 3D data sets, similar to so-called censor bars in the eye area of standard 2D photos (Fig. 3). For each patient, the size (width, height and depth) and position of a block that could cover both patient eyes completely were determined. Afterwards, all voxels in the area of the block were overwritten with one value. This procedure makes it impossible to reconstruct the original grey values and thus impossible to identify the patient’s face. However, the mandible is not affected by this procedure because it is located far below the eyes.
Only this manuscript includes the complete data description, the detailed results and the full data sets of the described CT library. In particular, the complete valid ground truth data and 3D masks are only available within this manuscript.
Medical CT scanners used for data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5) are regularly subjected to quality control evaluations and are certified medical products, since they are routinely used in daily clinical practice. CT scanners are under the responsibility of a qualified medical physicist.
In routine clinical CT image artefacts, shading can occur near objects of high contrast, such as between bone and soft tissues. The occurrence of these artefacts can be triggered by technical factors but especially by metal materials, such as dental restorations or osteosynthesis plates in the patient37. These artefacts can strongly limit diagnostic procedures or image-based segmentation processes, especially if such data sets are used for 3D visualizations of a structure of interest. To obtain valid data sets and avoid such artefacts and limitations, only CT data without teeth and metal materials in the lower jaw were selected17,21. Although image-based software programs for the reduction of metal artefacts in the craniomaxillofacial complex exist to potentially improve the diagnostic accuracy of CT images, such as the iterative metal artefact reduction (iMAR) algorithm38,39, we did not use such image modifications in this work to provide an original data collection. Therefore, a homogeneous data collection without significant artefacts was created to avoid discrepancies between the created ground truth volumes and the scanned anatomical structure caused by artefacts or by missing or damaged slices17.
The parameters of the CT scanners were set to optimally visualize the mandible and the bone structures of the craniomaxillofacial complex since the CT data were clinically generated for diagnostic reasons. Due to clinical expertise and well-established anatomical cortical structures in the CT scans, manual segmentation was feasible; additional statistical validation of the segmented volumes was performed according to the defined assessment parameters. Segmented 3D image reconstructions were not interpolated between slices to ensure that the resulting 3D models were as accurate as possible (interpolation will probably save time but could introduce some inaccuracies in the visualized results).
Although automatic software segmentation usually provides fast and smooth 3D models from the image slices, we did not use any automatic segmentation software, such as Materialise®, for ground truth model creation. This decision was made to generate ground truth models that are as accurate as possible and created by medical experts without the support of automatic software segmentation. This approach is especially important for atrophic mandibles due to the presence of less calcification. Optimized grey scales and high bone density can reduce these limitations, but a certain so-called image-based smoothed transition remains at the border between hard and soft tissue. In contrast, clinical experts usually interpret anatomical structures more precisely than a standardized algorithm or software program, especially if two experts each segment the same structure independently. However, clinical experts do need more time than a computer algorithm for the segmentation of an anatomical structure. Consequently, the extensive effort and human resources required for this process are probably why accessible manual ground truth segmentations or data collections created directly by clinical experts are rare.
To avoid further inaccuracies that probably would have occurred during the manual segmentation process due to inter-observer variability or subjectivity, the ground truth models were statistically tested for validation after the segmentation was completed by the two clinical experts.
All data sets were successfully segmented by the clinical experts without interruptions and were saved as a 3D binary mask. Descriptive and analytical statistical calculations testing the validity of the data are presented in Tables 2, 3, 4. The segmentation times of the manual slice-by-slice segmentations were 40.4±4.43 min (ground truth A) and 40.9±4.77 min (ground truth B) on average (Tables 2 and 3). The mean volume values of the manual slice-by-slice segmentations were 31.28±10.46 cm3 (ground truth A) and 31.42±10.48 cm3 (ground truth B). On average, the number of voxels in the segmented data sets was 131260±52273.3 for ground truth A and 131154.5±50420.2 for ground truth B. The overlap agreement between the two manual segmentations (A, B) yielded a Dice score of 94.09±1.17%. The calculated HD between the two manual segmentations (A, B) was 3.87±1.21 voxel units. These values both represent a close agreement between the sets of 3D models, since the Dice score values are high, showing a high percentage of agreement in the 3D models, and the HD was low, showing small voxel differences between the sets of 3D models. The calculated differences between the ground truth segmentations (A, B) were not significant for the assessment parameters volume and voxels (p > 0.05) (Fig. 4a and c). Furthermore, the product-moment correlation coefficient (Pearson) in the regression models for volume and voxels was close to the value one and was not below 0.99 (r > 0.99) when the two segmentation groups were compared (Fig. 4b and d).
The measurement values of the segmentation volumes and voxels in the regression models were localized closely along the regression lines (Fig. 4b and d). The resulting boxplots are similar in terms of the volume and voxel values between the segmentations made by clinical experts A and B (Fig. 4a and c). This comparison is also valid for the parameters DSC and HD, as shown in Fig. 5a and b. Both parameters were similar in a comparison between manual ground truth segmentations A and B (Tables 3 and 4).
High DSC values, low HD values, strong positive correlations and measurement values close to the regression line were achieved between the compared segmentation models without significant variability or inaccuracy, which demonstrates strong accordance between the segmentation models and the valid ground truth data.
For a visual assessment, Fig. 6 presents the overlap of two manual ground truth segmentations (grey and turquoise) using a 3D visualization of one case. Since the ground truth schemes were created manually according to the number of image slices without modifying or interpolating slices, a clear tomography-based 3D visualization is achieved in the reconstructed 3D segmentation model. The slice-by-slice ground truth segmentation is truly visualized as a stepwise surface countering, meaning each visualized step defines one slice.
However, for a more quantitative assessment of the agreement between the 3D models, interested users could use a program like CloudCompare (https://www.danielgm.net/cc/) to compare pairs of models. This process is similar to a study that compared 3D models of fossils produced with different techniques40.
Although the use of many image-based segmentation algorithms has been reported for the craniomaxillofacial complex41,
If available, medical ground truth models are often incomplete or include so-called faulty slices due to skipped slice series or radiological artefacts during the creation of these models. Especially for the craniomaxillofacial complex, such incomplete ground truth models are not relevant because parts of the anatomical structures are missing. For example, artefacts from dental restorations, which can be present in multiple locations in a single patient, usually include between 20–30 slices depending on the slice thickness of the data set (Fig. 7). Additionally, 3D medical ground truth models are strongly associated with certain inaccuracies due to inter- and intra-observer deviation in the segmentation process during ground truth creation, which can lead to subjectivity or invalid models21.
However, this contribution provides a description of an accessible and unique data collection of 10 valid ground truth segmentation models of the complete mandible originating from routine clinical practice without artefacts or faulty slices (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5). The ground truth data were created by medical specialists17,21 and statistically tested for validation to avoid invalid models. These data provide resources for conducting serial image studies of the human mandible, evaluating segmentation algorithms and developing adequate image tools for clinical or scientific use. The accuracy and functional stability of segmentation algorithms can be further tested for reproducibility in comparison to other open-source based image tools or CE-certified medical software products, such as MIMICS (https://biomedical.materialise.com/mimics), with the provided data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5), clinically and/or scientifically.
The presented clinical CT data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5) may be of high relevance because image-based software programs have gained importance in various medical fields in recent years47, and the evaluation of image-based software segmentation is therefore a central point of interest in related computer-assisted workflow procedures both clinically and scientifically7,8,48,
The main research highlights of the presented work are as follows:
Quality evaluation of the ground truth segmentations and comparisons were performed by calculating DSCs30,31 and HDs32 and performing statistical analysis to directly compare the volume, voxel, DSC and HD values17] of the created ground truth models.
A unique clinical CT image data set collection of the mandible, fully segmented ground truth data and 3D masks of the same anatomical structure are provided and have been statistically tested for validity for reproducible use by others.
The CT image data set collection, ground truth data and 3D masks were originally created without artefacts or missing slices to provide complete data sets.
The provided data can be used for the further assessment or development of varying segmentation approaches, for reproducible investigation of image-processing tools or for individual research purposes (please see the Acknowledgements section for more information).
Despite the extensive data presented in this work, there are several areas for future work, such as the addition of more data over time and the generation of more manual ground segmentations (by clinical experts from other institutions) to create an expanding clinical CT data platform for the investigation of previously and newly developed image-processing tools or software tools.
The anonymized CT scan data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5) is stored as single .nrrd files, e.g., “pat1.nrrd”, and can be processed with common and free medical platforms, such as MeVisLab (https://www.mevislab.de/)51, 3D Slicer (https://www.slicer.org)33 and MITK (www.mitk.org)52, and tools such as MATLAB (https://www.mathworks.com). 3D models (Figs 3 and 8) can be generated and exported in various formats by using the software packages mentioned above. In addition to these file formats, we provide CT scans as stacked TIFFs and manual segmentations in the standard 3D file format STL to increase accessibility for more potential users53. The .nrrd files were converted to stacked TIFFs with Slicer 4.6.254, and the manual segmentation masks were converted to STL files with MeVisLab 2.755 and the WEMIsoSurface module. The corresponding mandible contours (ground truth segmentations) will be provided as CSO files, whereby an “a” stands for the segmentations of physician A and a “b” stands for the segmentations of physician B. Moreover, we encoded the segmentation times in the filenames, for example, “Pat 1a (36 min 15 sec).cso” or “Pat 1b (39 min 43 s)”. The CSO format was used in this work to save the individual outlines of the clinical experts. These CSO data are not altered in any way. In addition, we will also provide the manual segmentations as voxelized 3D masks in .nrrd format (note: in contrast to the outlines, these masks are filled). Thereby, the masks fit to the corresponding patient file, for example “mask1.nrrd” belongs to “pat1.nrrd” and so on (note: we used the same file name convention described for the CSOs, e.g., “Pat 1a (36 min 15 sec) Manual 3D Mask.nrrd”). The corresponding files are in the same coordination system, which means they can be loaded and overlaid without any registration (Fig. 3). The CT data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5) is stored as .nrrd files and can be used to test and compare conventional freely available image-based algorithms or commercial financially available software processing tools for the mandible for clinical or scientific use. The quality and accuracy of software programs can be evaluated in comparison to the appropriate ground truth data provided as CSO files using, for example, DSC and HD calculations, as described in this manuscript. Therefore, the ground truth data set described in this work can be used as a control group in comparative assessments of image-based computer programs and segmentation algorithms that are tested on CT data for the mandible. Additionally, 3D models can be produced when software programs are tested on this data collection (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.6167726.v5) and superimposed on the ground truth CSO data for a visual assessment. 3D image reconstructions of structures of interest can be further generated for quantitative and visual testing of morphological variability.
Finally, we should mention that we did not perform any postprocessing of the masks, e.g., 3D surface smoothing, because we did not want to alter the manual slice-by-slice segmentations, which would alter the ground truth.
For research purposes, the data can freely be downloaded, but we kindly ask investigators to cite our work. The data provided within this work are free to share — copy and redistribute the material in any medium or format. Furthermore, the data are free to adapt — remix, transform, and build upon the material for any purpose. The data within this work are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/).
The data files attached to this manuscript can be opened in ImageJ with good results and can be displayed at full quality. The files can also be opened in Irfanview, but the data do not display at full quality.
How to cite this article: Wallner, J. et al. Computed tomography data collection of the complete human mandible and valid clinical ground truth models. Sci. Data. 6:190003 https://doi.org/10.1038/sdata.2019.3 (2019).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wallner, J., & Egger, J. Figshare https://doi.org/10.6084/m9.figshare.6167726.v5 (2018)
This investigation was approved by the internal review board (IRB) of the Medical University of Graz, Austria (IRB: EK-29-143 ex 16/17). This work received funding from the Austrian Scientific Fund (FWF-KLIF): “Enfaced: Virtual and Augmented Reality Training and Navigation Module for 3D-Printed Facial Defect Reconstructions” (KLI-678-B31; PIs: Jürgen Wallner and Jan Egger), the TU Graz Lead Project (Mechanics, Modeling and Simulation of Aortic Dissection) and CAMed (COMET K-Project 871132), which is funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) and the Austrian Federal Ministry for Digital and Economic Affairs (BMDW) and the Styrian Business Promotion Agency (SFG). Some of the data in this collection were partly used in previously published works17, 22, 23. In these publications, the data were used as testing data for single algorithms or deep learning networks. Some CT data sets were used to test a deep learning network22 in the mandible for fully automatic network-based segmentation and to test the accuracy of the licence-free open-source algorithm GrowCut (https://www.growcut.com) in the mandible17. In this investigation17, parts of the CT data that were used for algorithmic testing were initially uploaded as a figshare repository for reproducibility reasons because of the journal’s requirements for publication. The overall HD and DSC results presented in this publication were partly presented within a conference paper21. Further, one CT dataset was used in .stl (standard triangle language) file format as a surface model without segmentation to evaluate a software tool for computer-aided planning of miniplate positioning for oral and maxillofacial surgery23. However, only this manuscript includes the complete data description, the detailed results and the full data sets of the described CT library. In particular, the complete valid ground truth data and 3D masks are only presented within this manuscript.