Multi-scanner and multi-modal lumbar vertebral body and intervertebral disc segmentation database

Magnetic resonance imaging (MRI) is widely utilized for diagnosing and monitoring of spinal disorders. For a number of applications, particularly those related to quantitative MRI, an essential step towards achieving reliable and objective measurements is the segmentation of the examined structures. Performed manually, such process is time-consuming and prone to errors, posing a bottleneck to its clinical applicability. A more efficient analysis would be achieved by automating a segmentation process. However, routine spine MRI acquisitions pose several challenges for achieving robust and accurate segmentations, due to varying MRI acquisition characteristics occurring in data acquired from different sites. Moreover, heterogeneous annotated datasets, collected from multiple scanners with different pulse sequence protocols, are limited. Thus, we present a manually segmented lumbar spine MRI database containing a wide range of data obtained from multiple scanners and pulse sequences, with segmentations of lumbar vertebral bodies and intervertebral discs. The database is intended for the use in developing and testing of automated lumbar spine segmentation algorithms in multi-domain scenarios. Measurement(s) Vertebral Body • Intervertebral Disc Technology Type(s) Magnetic Resonance Imaging Measurement(s) Vertebral Body • Intervertebral Disc Technology Type(s) Magnetic Resonance Imaging


Background & Summary
Magnetic resonance imaging (MRI) is the modality of choice for detecting and visualizing almost all spinal disorders, as it provides non-invasive soft tissue visualization with excellent contrast, more detailed compared to other modalities [1][2][3][4][5] . Thus, it is widely utilized in orthopedic and neurosurgical diagnostics, ranging from dedicated imaging in scoliosis, intervertebral disc disease, and osteoporosis to injuries including vertebral fractures as well as to computer-assisted surgical intervention planning and guidance [1][2][3][4]6 .
Visual image reading represents the standard approach in the clinical setting for evaluation of routine spine MRI. Reporting derived from visual image assessment by the radiologist may be enhanced by using the approach of structured reporting (e.g., use of predefined formats and terms to create radiological reports, using a high level of standardized and organized information in template context [7][8][9] ) and by implementing semi-quantitative grading schemes (e.g., Pfirrmann classification for lumbar disc degeneration 10 ). Going one step further, generating quantitative measures for specific anatomical structures along the spine would be welcome to be able to provide meaningful objective markers related to spinal disorders, which could facilitate patient phenotyping, clinical management, and adequate treatment selection. A robust and precise segmentation of vertebral bodies and intervertebral discs is a major step towards a reliable diagnosis of various conditions in automated and computer-assisted systems, as well as for quantitative MRI regarding extraction of image-based markers 11-15 . However, automated spine segmentation has not yet made the transition to clinical routine and remains a challenging problem due to the variable and complex shape of the spine anatomy, as well as the presence of noise and other artefacts in imaging data 6,16 . Moreover, routine spine MRI acquisitions pose several challenges for achieving robust and accurate segmentations, which is mostly due to some unavoidable MRI acquisition characteristics and pitfalls. Specifically, these include the presence of partial volume effects related to anisotropic spatial resolution, non-homogeneous intensities between central and marginal areas of the spine due to bias field artefacts, and the non-existence of standardized measurement units (unlike Hounsfield Units as used for computed tomography [CT]) 6 . This is additionally highlighted in large multi-site studies, where variations in scan parameters often produce images that vary significantly in contrast and quality, especially in cases where multiple diagnostic MRI sequences are used among different clinical practices 17,18 .
Up to now, diagnostic and analysis methods of MRI data predominantly rely on manual annotation of anatomical objects, such as vertebrae and intervertebral discs. This process is time-consuming and subjective, and often prone to intra-and inter-annotator variability 17,19 . Of note, the high time expenditure of manual segmentation may hamper direct use, particularly when multi-level segmentations are desired (e.g., segmentation of the entire lumbar spine). Thus, automating the segmentation process would strongly benefit clinicians and scientists, especially for large-scale and multi-site studies targeting quantitative MRI. However, a reliable automated segmentation method, which has the potential to be included in standard clinical routines in the future, has to be able to generalize to the large variety of MRI sequences and parameter settings, while being reasonably fast and not overly complex to operate. This remains a difficult challenge, despite the surge in recent developments of automated segmentation techniques 18,20 . www.nature.com/scientificdata www.nature.com/scientificdata/ Most existing methods focus on one particular pulse sequence, which is not sufficient to adequately acknowledge the frequent multi-sequence acquisitions in clinical routine, or they may suffer from considerably long computational time, as well as the requirement for user input and navigation 21,22 . Other methods require extensive prior knowledge, such as shape information, to construct a representative preliminary model of anatomical shape from training data and to approximate optimization to new data 16,23,24 . Such methods are clearly limited by prior knowledge, which -in the case of large-scale and multi-site studies -should contain enough representative information to handle all possible variations in data. Current constraints in data collection and data sharing pose a challenge in acquiring enough of such data, thus producing highly specific models that are often only applicable to a small variety of cases. Automated medical image analysis algorithms should be robust enough to inherent data variability to ensure their successful integration into existing infrastructure, with the long-term perspective of becoming clinically feasible. However, without enough representative data comprising all of the diverse variations in spine MRI, algorithm generalizability is hard to achieve.
Recent advances in deep learning-based segmentation methods hold a lot of promise to alleviate the challenges described above 21,[25][26][27][28] . This seems especially true for improving the generalization on multi-site and multi-scanner data. However, used algorithms require large datasets annotated by experts for both development and testing. Yet, publicly available datasets of annotated vertebrae and intervertebral discs, collected from multiple scanners and different acquisition protocols, are very limited. Therefore, the purpose of this article and its related dataset is to provide a reference database containing a wide range of MRI data obtained from multiple scanners and involving various pulse sequences, along with the segmentations of lumbar vertebral bodies (L1 to L5) and intervertebral discs.  Table 2. Scan characteristics with image sequences per scanner vendor and model type for the first scan. *T1w, nc T1-w, T2-w, T1-fs, STIR, and T2-w Dix. stand for T1-weighted contrast-enhanced, T1-weighted noncontrast-enhanced, T2-weighted, T1-weighted fat-saturated, short tau inversion recovery, and T2-weighted Dixon sequences.
Using a database like the herein presented, automatization in image analysis and processing could be further facilitated, which could directly influence the field of radiology and pave the way towards semi-automated and fully-automated algorithms for radiological diagnostics. As such, over the recent years, automated spine image analysis has seen a growing interest, particularly for the detection of vertebral fractures 29 , assessment of spinal deformities 21 , and computer-assisted surgical interventions 30 . As the lumbar spine is a particularly common site for various spinal disorders that can cause chronic low back pain (LBP), segmentation of anatomical structures along the lumbar spine is of considerable interest. While many factors can contribute to spinal disease and deformation, such as fractures, accidental injuries, osteoporosis, vertebral neoplasm or scoliosis 31 , the primary cause of chronic LBP in most patients is believed to be related to spinal degeneration, particularly of lumbar intervertebral discs and endplates [32][33][34][35][36][37] . Spinal degeneration is a natural consequence of aging, but it can be accelerated by trauma, repetitive stress, systemic disease, and other factors 35,[38][39][40][41] . Thus, investigating the relevance between chronic LBP pathology and lumbar vertebrae and intervertebral disc morphology and composition using specific segmentations could aid in developing appropriate methods to assist early diagnosis, provide better surgical planning, and facilitate individualized treatment strategies and patient phenotyping.
In summary, we offer a database of manually segmented lumbar vertebral bodies and intervertebral discs in MRI datasets collected from a variety of scanners and using different pulse sequence protocols. Besides the images that are part of standard scanning routines, such as non-contrast-enhanced and contrast-enhanced T1-weighted, T2-weighted, and short tau inversion recovery (STIR) images, we provide the access to images obtained with a DIXON turbo spin-echo (TSE) sequence. Thus, combined with other available imaging sequences, this data could help in the development of more efficient and robust methods of segmenting musculoskeletal structures, and it could facilitate quantitative MRI by achieving automated segmentation and analysis of datasets, which is useful for assessing vertebral bodies (e.g., fat fraction [42][43][44][45][46] ), cartilage endplates (e.g., T2* 47,48 ), and intervertebral discs (e.g., T1rho mapping 44,48,49 ). The data including segmentations can be used as training and test datasets for (semi-)automated algorithms, which can potentially benefit from the heterogeneous character of this study's database, as it fosters the development of approaches that are more generalizable to data derived from multiple sites and scanners, which is known to be a major challenge for models that are typically trained on highly homogeneous data.

Methods
Patient cohort. This retrospective study was approved by the local Institutional Review Board (Ethikkommission der Technischen Universität München). The requirement for written informed study consent was waived due to the retrospective character of the specific analyses used for this publication. Yet, all patients provided standard informed consent for MRI scanning during clinical routine and agreed on the day of visit for MRI acquisition on an opt-in basis that their data may be used for scientific purposes. The patients were informed that their data may be anonymously publicly shared without any personal identifiable information, and the analyses for this study were performed on de-identified data.  Table 3. Scan characteristics with image sequences per scanner vendor and model type for the follow-up scan. *T is the time interval between the two scans. **T1-w, nc T1-w, T2-w, T1-fs, STIR, and T2-w Dix. stand for T1-weighted contrast-enhanced, T1-weighted non-contrast-enhanced, T2-weighted, T1-weighted fat-saturated, short tau inversion recovery, and T2-weighted Dixon sequences.
The database contains MRI datasets collected from these 34 patients, whereby 21 patients were rescanned at a second time point using a different MRI scanner model and/or sequence protocol (mean interval between first and second MRI scan: 8.3 ± 12.6 months, interval range: 0.5-60.8 months). From the remaining 13 patients only MRI datasets of one imaging time point were used. MRI scans acquired between August 2014 and October 2019 were considered in this study, with the data being acquired either on institutional-intern scanners or on scanners of other institutions, thus being available after image transfer due to clinical requests. A detailed list of pulse sequences and MRI vendors per patient, together with the time intervals between scans, are shown in Tables 2 and 3.
The field of view (FOV) covered at least the lumbar spine. Only scans that were performed in supine position were included. Each patient had at least one MRI scan performed on a Philips system with the following reference sequences: www.nature.com/scientificdata www.nature.com/scientificdata/ Division of Medical and Biological Informatics, Medical Imaging Interaction Toolkit, Heidelberg, Germany). The manual segmentations were done in the sagittal plane for each vertebral body and intervertebral disc, respectively. Segmentation was performed by a medical doctor. Regions of interest (ROIs) were carefully drawn at the boundaries of the vertebrae and intervertebral discs, respectively. The other image planes were used to check for accidentally included structures not belonging to the vertebral bodies or intervertebral discs. In these cases, ROIs were corrected accordingly. For lumbar vertebral bodies, each vertebrae was separately enclosed in sagittal slices, avoiding any inclusion of paraspinal tissue or fluid. Posterior elements were not considered, thus restricting the segmentations to the vertebral corpora only. Analogously, lumbar intervertebral discs were also separately segmented in sagittal slices, considering the entire disc of the whole circumference. All segmentations were supervised by a radiologist with eleven years of experience. Segmentation time per one T1-weighted image series amounted to 60-90 min.
The obtained segmentation labels, generated in non-contrast-enhanced T1-weighted images of all patients of one or two MRI sessions, were then overlaid over the rest of the available sequences acquired with different MRI scanners and pulse protocols. Due to the differences in imaging and scanning parameters, the original images were registered to the non-contrast-enhanced T1-weighted images using the 3D linear registration tool in ITK-SNAP (https://www.itksnap.org), ensuring that labels were accurately overlaid. We resampled each scan in the space of the respective non-contrast-enhanced T1-weighted image using linear interpolation with an identity transform 50,51 , given that we have the case of two or more scans of the same patient with different www.nature.com/scientificdata www.nature.com/scientificdata/ coordinate transformations to patient space. Once resampled, additional verification was performed to ensure all segmentation masks are accurately overlaid over all available sequences per patient. The verified data were subsequently uploaded to the database.
Examples of segmented lumbar vertebral bodies and intervertebral discs are shown in Figs. 1-4. In detail, Fig. 1 shows two scans acquired on scanners from differing vendors (Philips and Siemens) per each available sequence. Similar is shown in Fig. 2 for another vendor (Philips and GE). Furthermore, Fig. 3 shows the differences between scans of the same patient acquired from the same vendor's scanners (Philips), which differ in models. Figure 4 provides three exemplary patient cases with segmentations shown on T1-weighted imaging in presence of pathological findings that may render manual segmentations particularly demanding (e.g., vertebral fracture, spondylodiscitis, and metastatic bone lesions).
The manual segmentations of each vertebral body and intervertebral disc are available as separate binary masks, where pixels with an intensity value of 1 correspond to the tissue of interest, while pixels of an intensity value of 0 belong to the background. Each mask of each image volume was stored as a separate *.nii file. In total, each available volume is accompanied with the corresponding segmentations of lumbar vertebral bodies and intervertebral discs.

Data Records
The database is available online within the OSF Repository, including the different imaging datasets, segmentation files, as well as metadata comprising patient characteristics and details on the available pulse sequences and used MRI systems per patient (Identifier: https://doi.org/10.17605/OSF.IO/QX5RT) 52 . Furthermore, Python scripts to read and visualize the data by utilizing open-source image analysis libraries -SimpleITK (https:// simpleitk.org/) and NiBabel (https://nipy.org/nibabel/) -are provided.
Non-contrast-enhanced and contrast-enhanced T1-weighted, T2-weighted, STIR, and T1-weighted fs images are stored as separate datasets for each patient per scanner model. In the case of the T2-weighted DIXON sequences, sagittal water, fat, and in-phase images are deposited as separate datasets for each patient, if available. The segmentation maps of each vertebra (L1 to L5) and intervertebral discs (L1/2 to L4/5) are stored as *.nii files. All imaging data, as well as the segmentation maps, are saved using the Neuroimaging Informatics Technology Initiative (NIfTI) format (https://nifti.nimh.nih.gov/). NIfTI files have several features, such as raw data saved in 3D, containing two affine coordinates to relate voxel to spatial index, as well as additional data such as key acquisition parameters, encoding directions, and grid spacing, saved as a part of the header. NIfTI files can be directly read in a number of programming environments, such as Python, Matlab, and R, as well as directly visualized using tools such as ImageJ (https://imagej.nih.gov/ij/), ITK-SNAP (http://www.itksnap.org/), FSL (https://fsl.fmrib.ox.ac. uk/fsl/fslwiki), AFNI (https://afni.nimh.nih.gov/), and FreeSurfer (https://surfer.nmr.mgh.harvard.edu/).
Datasets of patients and corresponding segmentation masks are labeled with the same patient ID. Masks of each vertebra are labeled as L1 to L5, while the masks for each intervertebral disc are labeled as L1_2, L2_3, L3_4, and L4_5.

technical Validation
Acquisition of MRI was medically indicated and image quality control was performed during and/or immediately after completion of the exam by the technologist and radiologist in charge. Furthermore, inclusion of image data in the current study was performed by a board-certified radiologist with eleven years of experience based on a sufficient image quality.