A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data

With the advent of deep learning algorithms, fully automated radiological image analysis is within reach. In spine imaging, several atlas- and shape-based as well as deep learning segmentation algorithms have been proposed, allowing for subsequent automated analysis of morphology and pathology. The first “Large Scale Vertebrae Segmentation Challenge” (VerSe 2019) showed that these perform well on normal anatomy, but fail in variants not frequently present in the training dataset. Building on that experience, we report on the largely increased VerSe 2020 dataset and results from the second iteration of the VerSe challenge (MICCAI 2020, Lima, Peru). VerSe 2020 comprises annotated spine computed tomography (CT) images from 300 subjects with 4142 fully visualized and annotated vertebrae, collected across multiple centres from four different scanner manufacturers, enriched with cases that exhibit anatomical variants such as enumeration abnormalities (n = 77) and transitional vertebrae (n = 161). Metadata includes vertebral labelling information, voxel-level segmentation masks obtained with a human-machine hybrid algorithm and anatomical ratings, to enable the development and benchmarking of robust and accurate segmentation algorithms.


Background and Summary
4][15][16][17] However, most of these approaches are largely data dependent, as the algorithms require extensive datasets with corresponding metadata for the development, training and validation to enable efficient models. 18ming at the task of improving automated quantification of spinal morphology and pathology by vertebral labelling and segmentation, the first iteration of the "Large Scale Vertebrae Segmentation Challenge" (VerSe 2019) 17,19 was held at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019, Shenzhen, China).The segmentation challenge received considerable participation from the scientific community with more than 250 registrations and 20 participating teams. 19As part of the VerSe 2019 challenge a large dataset was provided addressing the previous severe shortage of publicly-available, large, accurately annotated CT spine data in the community by releasing 160 CT image series and their voxel-level annotations comprised of a large variety in fields of view and spatial resolutions as well as spinal and vertebral pathologies. 20ilding on the data, experience, and learning from the VerSe 2019 challenge, we proposed to organise a second iteration of the vertebrae segmentation challenge at the MICCAI 2020 in Lima, Peru.For the VerSe 2020 challenge, we aimed to substantially increase the existing dataset to 300 subjects.While retaining the richness of the VerSe 2019 dataset, the VerSe 2020 dataset was complemented with images from different institutions and four major scanner manufacturers.In addition, we aimed to include rare anatomical variants such as numeric aberrations and cervicothoracic or lumbosacral transitional vertebrae.As such variants have a low prevalence in the population, they are under-represented, if present at all, in unselected training datasets like that presented at the VerSe 2019 challenge.
Consequently, deep learning-based algorithms fail in such cases.Thus, focus was given to enrich the VerSe 2020 dataset with rare anatomical variants to improve derived model performance as previously described. 21We envisioned the creation of an extended annotated CT dataset to provide consistent and reliable ground truth data for algorithm training and benchmarking.
The proposed Verse 2020 dataset was released as part of the second iteration of the "Large Scale Vertebrae Segmentation Challenge" hosted at the MICCAI conference held in Lima, Peru (https://verse2020.grand-challenge.org/). 19The dataset was split into a training dataset, a public test dataset, and a private test dataset building on the preexisting VerSe 2019 dataset 20 published for the MICCAI conference in 2019, with an overlap of 105 CT image series now comprising 319 image series of 300 subjects.To date, this dataset represents the largest publicly available CT imaging dataset of the spine with corresponding metadata including labelling information, voxel-level segmentations of all fully visualized vertebrae and definition of enumeration abnormalities and transitional vertebrae.
In summary, the successful segmentation challenges held at the MICCAI conferences in 2019 and 2020 based on these public datasets confirm, that reliable, fully-automated deep learning algorithms for segmentation of the spine can be trained and that algorithm performance benefits from large and diverse datasets.
Therefore, we regard the VerSe 2020 dataset as an important step towards clinical translation of CADx algorithms for spine imaging, which may soon supplement the radiologist's work in daily routine.We are convinced that in the near future, patients will greatly benefit from CADx extracting even more relevant information from medical imaging than currently possible.

Subject Selection
This retrospective evaluation of imaging data was approved by the local institutional review board and written informed consent was waived (Proposal 27/19 S-SR).
Inclusion criteria for the dataset: Subjects older than 18 years were included, who had received CT imaging of the spine showing a minimum of 7 fully visualized vertebrae without counting sacral vertebrae or transitional vertebrae.Exam dates were limited to the time between February 5 th , 2016 and March 1 st , 2020.The minimum required spatial resolution was defined as 1.5 mm in the craniocaudal direction, 1mm in the anterior-posterior direction and 3mm in the left-right direction to allow for a sufficient delineation of vertebral deformities. 22As in VerSe 2019, traumatic fractures and bony metastases were excluded.Other osseous changes such as Schmorl nodes, hemangioma, degenerative changes, or the presence of foreign materials for kyphoplasty or spondylodesis intentionally remained part of the dataset to reflect the widest possible spectrum of spine morphology.Aiming at a >100% increase compared to the 141 subjects from Verse 2019 and providing a dataset with 50% multivendor data and 50% anatomical variants, we composed a dataset consisting of 300 subjects, including 86 subjects from VerSe 2019 and 214 new subjects.In order to select new subjects, we searched the institutional picture archive and communication system (PACS) regarding two aspects: 1) CT studies imported from other institutions, that were acquired on scanner hardware different from that installed in our institution; 2) CT reports documenting the presence of spinal anatomical variants including enumeration abnormalities, transitional vertebrae or cervical ribs.In both queries, we aimed for a balanced composition of cases: From the first query, we selected 20 subjects from external Toshiba scanners, 20 subjects from external GE scanners and 30 subjects from external Siemens scanners.We added to 30 subjects from Ben Glocker´s 23 dataset and 50 subjects from VerSe 2019 (10 from Siemens scanners and 40 from Philips scanners), to form the multivendor dataset with 150 subjects.The query for anatomical variants of spinal anatomy such as thoracolumbar and lumbosacral transitional vertebrae, cervical ribs, thoracic and lumbar short ribs as well as enumeration variants revealed 308 subjects.
Adding to 36 VerSe 2019 subjects with anatomical variants we selected another 114 subjects to form the 150 subjects of the dataset with anatomical variants.The final dataset comprised 300 subjects with a total of 319 image series, as some subjects comprised two separate image series (e.g.thoracic spine and lumbar spine).Subject characteristics and data subset stratification are listed in Table 1.All selected imaging series were categorized regarding their primary attribute (e.g.Toshiba scanner, Castellvi grade 4 transitional vertebra, or numeric aberration with 4 lumbar vertebrae) and each subgroup was randomly split to the training, public validation and private test subsets as demonstrated in Figure 1.
Despite the wide inclusion criteria, very rare variants might still be missing and osseous changes other than the included anatomical variants or cases with new foreign material may still limit its generalizability.Also, the sacrum including lumbar transitional vertebrae, i.e. vertebrae partially fused with the sacrum, have not been segmented in this dataset.Future works, potentially based on the presented data, will need to address complete segmentation of the spine including the sacrum.Another future focus may be the inclusion of more pathological changes, e.g.only two cases with half-vertebrae are included in VerSe 2020, metastases and dislocated traumatic fractures were excluded.In this regard, it first remains to be discussed how these pathologies should be labelled and segmented.

Vertebral Segmentation
The labelling of the vertebrae and the segmentation process are essential steps in processing spine data.
All subsequent analyses such as the detection and grading of fractures, calculation of bone mineral density but also analysis of spinal shape, curvature, and deformity such as scoliosis rely on these initial tasks.
For the proposed dataset, we used a semi-automated in-house developed algorithm to generate segmentation masks of the vertebrae step-by-step as illustrated in Figure 2: first, the CT input data was anonymized by conversion to Neuroimaging Informatics Technology Initiative (NIfTI) format (https://nifti.nimh.nih.gov/nifti-1).To ensure full anonymity, defacing was achieved by deleting the raw data in a manually segmented mask of the face.In case of the 86 re-used VerSe 2019 subject datasets, the spatial resolution has previously been reduced to 1 mm isotropic or in sagittal 2 mm/ 3 mm series to kept.Second, a deep-learning framework (publicly accessible under: https://anduin.bonescreen.de)was used to label and segment individual vertebrae.In brief, at first a low-spatial resolution CNN is used for the detection of all osseous spinous structures, resulting in a low-resolution heatmap.This was used to automatically generate a spine bounding box containing the spine.Second, the Btrfly Net was used to label the vertebrae, with the option to manually correct the centroids if needed.5][26] These vertebral masks were created at 1mm 3 isotropic resolution and subsequently merged into one multi-label segmentation mask with individually labelled vertebrae.Originally, both Btrfly Net and U-Net were trained with the VerSe 2019 data and have continuously been improved using the labels and segmentation masks derived from this dataset for repetitive training.Third, the segmentation output of the algorithm was manually corrected in a laborious process by two specifically trained medical students using the open-source software ITK-SNAP 27 .This manual correction was performed in the original imaging space, but was limited to an accuracy of 1mm, similar to the output of the U-net; i.e. segmentation errors in smaller structures than 1mm were not corrected.Finally, corrections were reviewed and corrected or approved by a neuroradiologist to achieve the highest possible consistency of the presented segmentation masks.
Despite the good performance of the baseline algorithm, the correction of on average 15 objects of interest with approximately 10 5 voxels took considerable effort.

Anatomical Variants
As anatomical variants of the spine can be frequently observed, both numeric and morphologic changes have been intentionally included in the VerSe 2020 dataset.Tins and Balain have reported numeric anatomical variations of the spine to be more frequent (7.7%) than transitional vertebrae (3.3%) with a tendency of male subjects to show more additional vertebrae and females to show more missing vertebrae. 28Numeric aberrations of the cervical spine are rare, but additional cervical ribs can be frequently observed in a clinical setting with a prevalence of 0.05% -6.1%. 29According to the literature, variations of the thoracolumbar region are generally rarely reported and often overlooked, despite potential clinical implications e.g. for surgical planning.Variants of the lumbosacral region are frequently observed and according to Thawait et al. 29 can be found in 4%-30% of examined cases.Wigh et al. differentiated the presence of accessory ossification centres and stump ribs as compared to the last pair of ribs at a thoracic vertebra based on length and morphology and defined any vertebra with ribs shorter than 38 mm as transitional thoracolumbar vertebra. 30While there are different approaches to classify transitional vertebrae, computer algorithms need a clear definition to which part of the spine a vertebra belongs.Therefore, in our dataset vertebrae with ribs larger than 38 mm on either side and a typical diagonal downward alignment were classified as thoracic vertebrae.In cases with both ribs smaller than 38 mm, e.g., when only small ossification centres were present with a horizontal alignment, the vertebra was considered lumbar.In ambiguous cases, additional morphological features were used to identify a vertebra as thoracic or lumbar such as shape features of the vertebra and the orientation of the articular joint facets.
Regarding the lumbosacral region, we used the well-established Castellvi classification to lumbosacral transitional vertebrae mainly based on the morphology of the transverse process of the last lumbar vertebra and whether it is fused with the sacrum or not. 31Owing to the Castellvi classification we did not segment vertebrae that showed partial fusion with the sacrum (Castellvi grades III and IV) and did not include them for further analysis.Of note, in our database search, no "lumbalized" sacral vertebra with four sacral vertebrae remaining fused could be identified.
If present in the scan, vertebrae were labelled starting at the first cervical vertebra.If T1 was not visible within the scan's field-of-view, the thoracic spine was considered to have 12 vertebrae, as enumeration errors in the thoracic spine are much less frequent compared to the lumbar spine. 28

Data Repositories and Storage
The Verse 2020 dataset comprises 319 CT image series from 300 subjects and 4142 vertebrae encompassing 581 cervical, 2255 thoracic, and 1306 lumbar vertebrae as listed in Table 1.The stratification of anatomical variants in the VerSe dataset along with the corresponding ratings is listed in Table 2.The Dataset with its division into test, training, and private data subsets has been made publicly available under the creative commons license CC BY-SA 4.0 hosted at the open science framework https://osf.io/t98fz/.More information regarding the segmentation challenge algorithms submitted by the participants of the MICCAI VerSe challenges in 2019 and its second iteration in 2020 can be found at https://verse2019.grand-challenge.org/and https://verse2020.grand-challenge.org/as well as in the publication by Sekuboyina et al.. 19 Worth of note, there is an overlap of 86 subjects and 105 imaging series between the VerSe 2020 dataset and the previously published VerSe 2019 imaging dataset, which is separately available for public use under the creative commons license CC BY-SA 4.0 at https://osf.io/nqjyw/.In a previous publication Loeffler et al. used the VerSe 2019 dataset to automatically detect and grade vertebral fractures and to calculate bone mineral density from a subset of the provided scans. 20

Data Structure and File Formats
All medical imaging files were converted into Neuroimaging Informatics Technology Initiative (NIfTI) format (https://nifti.nimh.nih.gov/nifti-1).Segmentation masks are also saved in NIfTI format and labels of all 4142 segmented vertebrae are provided in JSON format.
For organizational reasons of the segmentation challenge, all CT data (NIfTI format) in the Verse dataset was separated into a training dataset (100 subjects), a public test dataset (100 subjects), and a private test dataset (100 subjects) as previously described and demonstrated in Figure 1.Corresponding metadata is provided in the additional documents with the datasets.

CT Imaging and Scan Provenience
CT scans included were intentionally chosen to be heterogeneous to ensure the best possible training and generalization of the algorithms.Therefore, data from the four major scanner vendors including Philips, Siemens, Toshiba, and GE from a variety of different multidetector CT scanner types of each vendor was included.The majority of images (45.8%) was acquired by Philips, 32.3% by Siemens, 6.3% by Toshiba, and 6.3% by GE scanners as shown in Table 1 and, on a patient-level in Supplement 1.
There is no information regarding the scanner vendor provenience of the Glocker dataset 23 , therefore they are listed as scans of "unknown" origin.Part of these examinations was carried out with additional administration of oral and/or intravenous contrast medium of various manufacturers.All included imaging series were based on edge-enhancing reconstructions, as this is the clinical standard for bone CT-imaging.Because the dataset comprises multi-centre imaging data from different scanner vendors, isotropic data was not available in all cases, replicating a typical clinical scenario.

Technical Validation
The presented medical imaging data was derived from the institutional picture archiving system and therefore fully complies with the legal standards and quality controls for the acquisition of medical imaging in Germany and the European Union, as well as the industrial standards of the scanner vendors.
Segmentation masks were prepared and annotated at voxel-level by a human-machine hybrid algorithm with manual checks and corrections by specifically trained medical students.Afterwards, the masks were reviewed, corrected, and finally approved by a neuroradiologist.The anatomical ratings were carried out by two neuroradiologists in a consensus reading.The resulting NIFTI datasets have successfully been processed by all dockers of the 13 participants of the VerSe 2020 challenge.Of these, the best performing docker achieved a mean vertebral identification rate of 95.6% with a mean localisation error of less than 2mm.Concerning segmentation, the best mean Dice score of 91.7%.Hereby all authors state that there are no conflicts of interest regarding the presented manuscript and published data Lumbosacral vertebrae graded according to the Castellvi Classification.

Figure 1 :
Figure 1: Composition of the VerSe 2020 dataset: original data derived from the preexisting

Figure 2 :
Figure 2: Schematic overview on the image processing by the in-house developed algorithm

Table 1 :
Subject characteristics of the VerSe 2020 dataset and subset stratification.*Unknown scans were included from a public dataset 19 .

Table 2 :
Subjects with cervical, thoracolumbar and lumbosacral anatomical variants.