Multicenter dataset of multi-shell diffusion MRI in healthy traveling adults with identical settings

Multicenter diffusion magnetic resonance imaging (MRI) has drawn great attention recently due to the expanding need for large-scale brain imaging studies, whereas the variability in MRI scanners and data acquisition tends to confound reliable individual-based analysis of diffusion measures. In addition, a growing number of multi-shell diffusion models have been shown with the potential to generate various estimates of physio-pathological information, yet their reliability and reproducibility in multicenter studies remain to be assessed. In this article, we describe a multi-shell diffusion dataset collected from three traveling subjects with identical acquisition settings in ten imaging centers. Both the scanner type and imaging protocol for anatomical and diffusion imaging were well controlled. This dataset is expected to replenish individual reproducible studies via multicenter collaboration by providing an open resource for advanced and novel microstructural and tractography modelling and quantification.

(2020) 7:157 | https://doi.org/10.1038/s41597-020-0493-8 www.nature.com/scientificdata www.nature.com/scientificdata/ which is efficient but requires more individual validation regarding different diffusion measurements. In both scenarios, data acquired from the same subjects in multiple scanners are required. Until now, only a few repositories have been available for obtaining diffusion data with scanners of different types 24 and magnetic fields 25 . There is still a lack of diversity of multicenter data from strictly consistent acquisition.
Here, we released a multicenter diffusion dataset that was collected on three traveling subjects in ten centers and with two additional repeated scans in one center. Two features can be highlighted for the dataset. First, in all centers, the same type of scanners and same scan protocol were strictly followed, and the raw data were acquired by the same operating procedure and pre-processed with the same pipeline. Second, in the diffusion protocol, we used a popular multi-shell diffusion scheme, which can be adapted for most diffusion models either for multi-compartment analysis or for precise white matter tractography to resolve crossing fibers. Part of the dataset has been used successfully in one of our previous publications 26 . Generally, a well-established platform is crucial for the refinement and evaluation of novel methods. With highly equipped MRI scanners utilized in the data collection, this dataset could help in the evaluation of the individual reproducibility of advanced diffusion models for multicenter studies.
In the following sections, we briefly describe the dataset acquisition and processing procedures, technical validation, and sharing and access policy.

Methods
Data characteristics. The data were collected at 10 centers from October 2016 to November 2017. Three healthy traveling subjects (one male, 23 years of age, and two females, 26 and 23 years of age) were scanned in nine scanners once and in one scanner (Center 10) three times. All 10 scanners were 3 T MR MAGNETOM Prisma (Siemens, Erlangen, Germany), equipped with max gradient strength of 80 mT/m and slew rate of 200 T/m/s. The software version was Syngo MR VD13D in nine scanners and Syngo MR VE11C in one scanner (Center 09). In all scans, the same type of 64-channel head coil and the same acquisition parameters were used (details below).
The scans were processed by the same operator with a fixed operating procedure. For each subject, anatomical images and diffusion images were acquired. In the anatomical imaging, the field-of-view (FOV) was set according to the head orientation using the auto-align function in the Brain Dot Engine of the scanner, and in the diffusion imaging, the FOV orientation was set parallel to the anterior commissure-posterior commissure line. The FOV center was also aligned with the isocenter of the main magnetic field by moving the scan table.

Diffusion-weighted image (DWI).
DWIs were obtained using a simultaneous multi-slice (SMS) spin-echo echo planar imaging (EPI) prototype sequence 28 . The imaging parameters were as follows: TR/TE = 5.4 s/71 ms, FOV = 220 × 220 mm 2 , slice number = 93, voxel size = 1.5 × 1.5 × 1.5 mm 3 , bandwidth = 1712 Hz/Px, GRAPPA factor = 2, and SMS factor = 3, with reversed phase-encoding (PE) directions along anteroposterior (AP) and posteroanterior (PA) separately. The diffusion duration and diffusion time were 15.9 ms and 34.4 ms, respectively, for the monopolar diffusion gradients. The diffusion scheme, containing 30 vectors with uniform angular coverage on each shell (b-values = 1000, 2000, and 3000 s/mm², non-colinear between any two shells), was generated from a multi-shell vector sampling tool 29 . Six non-diffusion frames were equally distributed in the scheme. The total acquisition time was 19 minutes and 04 seconds. computational pre-processing. Practically, the DWIs acquired by MRI scanners are sensitive to noise, field inhomogeneity and head motion, which can cause image imperfections and require additional pre-processing for correction. In the released dataset, the raw images, stored in the Digital Imaging and Communications in Medicine (DICOM) format, were converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format for pre-processing. The DWIs were pre-processed using a common pipeline, which included denoising using the MRtrix3 30 (version 0.3.15-500-g382393bb), Gibbs-ring removal 31 , and distortion and motion correction using the FSL suite (version 5.0.11). Image distortion was estimated by the TOPUP tool from the non-diffusion images of PE directions to generate a field map function 32 . Both distortion and motion were jointly corrected with the EDDY tool 33 . In addition, the diffusion vectors were also rotated accordingly based on the EDDY correction. Finally, the images along the AP and PA directions were combined for subsequent analysis.

Data Records
Data privacy. Data collection was conducted with approval from the institutional review board of Xuanwu Hospital, Capital Medical University, Beijing, China, and all volunteers had signed the informed consent forms beforehand. Subject's facial features had been removed from the images using the Freesurfer (version v1.379.2.73).
Distribution for use. The dataset has been organized in the Brain Imaging Data Structure (BIDS) standard 34 and is publicly available in figshare 35 . The multicenter scans of each subject are encoded as multiple sessions by labels identifying the center ID and repetition number. Each session folder contains the subfolders "anat" and "dwi" for anatomical and DWI images, respectively. The images are stored in a compressed NIfTI format, and the sidecar JSON files are listed together with the relevant images. In addition, a file with the suffix "eddy_parameters" produced by EDDY for each scan has also been provided under the folder "derivatives".
www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation Quality control for diffusion data. The quality of the pre-processed images was evaluated. For each DW scan, we measured the signal-to-noise ratio (SNR) and ghost-to-signal ratio (GSR) from non-diffusion images. The background noise of all DWIs was also computed and compared. In addition, head motion was also estimated.
• SNR: The SNR was measured following the National Electrical Manufacturers Association (NEMA) standard on two non-diffusion images 36 . The mean values of two images within a white matter region of interest (ROI) at the genu of corpus callosum (GCC) were termed the signal. The noise was calculated from the standard deviation (SD) of the difference image within the same region divided by a correction factor of 2 . In each scan, six non-diffusion images produced five SNR measures from every two adjacent images. • GSR: The GSR is a measure of the Nyquist ghost artifact, which describes the signal leakage with a shift at 1/2 image size along the PE direction on image. In the non-diffusion images, a rectangular ROI at the center of the image of size 10 × 40 was selected as the signal region (ROI-s). Then, an ROI-g was defined as a pair of ghost regions by shifting the ROI-s half the image size up or down along the PE direction. Finally, an ROI-n was defined as two noise regions selected from the background with the same size as the ROI shifted left or right along the readout (RO) direction. These ROIs are outlined in blue in Fig. 1a. The mean intensities were computed within these defined ROIs. The GSR was calculated using the absolute signal difference between ROI-g and ROI-n and divided by the mean signal of ROI-s. • Head motion: The EDDY tool can estimate the head motion and eddy current-induced field when pre-processing the DWIs. The translation and rotation parameters relative to the first DWI frame were obtained as listed in *.eddy_parameters, together with 10 components interpreting the quadratic eddy current-induced field. In addition, two summarized volumetric movements were also generated as the root mean square (RMS) of the voxel displacement for each volume: one was calculated relative to the first frame, and the other was calculated between adjacent frames (as listed in *.eddy_restricted_movement_rms). • Noise on DWI: The background noise of the DWIs was compared across centers. Four square ROIs of size 5 × 5 at the corners were drawn in the background, avoiding the Nyquist ghost region. The noise was calculated as the SD of all selected voxels divided by a correction factor of 0.66, accounting for the Rayleigh distribution on the image, as suggested in the NEMA standard 37 .
Results of quality metrics. Figure 1 presents the quality metrics of the non-diffusion images. For the five SNRs and six GSRs calculated from each scan, their minimum, median, and maximum values are shown. Figure 2 presents the RMS motion metrics between adjacent DWI frames, including the AP and PA images. Figure 3 demonstrates the image noise evaluated from all DWI frames, which are reordered by b-values.
Visualization and reproducibility of diffusion models. Beyond the quality metrics on the DWIs shown above, the reproducibility of diffusion outcomes generated by diffusion models or algorithms must also be considered. Here, we conducted a post-processing procedure using MRtrix3 for diffusion fiber tractography as an example. Since the framework of the fiber tracking algorithm was complicated, an intermediate measure reflecting voxel-wise fiber orientation together with the final tractography was selected for visualization. The intra-subject results were compared across all centers.
• Fiber orientation distribution (FOD): The FOD represents the continuous distribution of underlying fibers for each voxel by the spherical deconvolution of the diffusion signal profile. To process the multi-shell diffusion data, multi-shell multi-tissue constrained spherical deconvolution 10 was utilized to estimate the FOD with a maximum harmonic order www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ of six. Figure 4 illustrates the FODs in an ROI containing crossing fibers that are mixed from the forceps minor, the anterior thalamic radiation, and the corticopontine tract.
• Track density imaging (TDI): The anatomically constrained tractography algorithm 11 was chosen for the multi-shell data to generate streamlines from the FOD with step size of 0.15 mm, length limitation from 3 to 250 mm and cut-off FOD amplitude of 0.06. Additionally, spherical deconvolution-informed filtering of tractograms was used to improve the accuracy of the fiber tractography 38 . After that, one million streamlines were reconstructed in the whole brain. To present the tractography, the TDI statistically measures the concentration of fiber streamlines within voxels. Figure 5 illustrates the TDI maps from all centers.

Usage Notes
The data from eight centers [01-07 and 10] in the current dataset had been analyzed in our previous publications 26 [10, H]. The data from centers 08 and 09 are newly added here. In addition, it should also be noted that we used an earlier version of the FSL suite (5.0.10) for pre-processing, and the Gibbs-ring removal was not used in our previous work 26 .
We also welcome any cooperation with us to fully explore this dataset.

code availability
The codes for the pre-processing steps above were assembled in the released Dataset. We also shared a package of code for running the pre-processing pipeline and the technical validation, and can be accessed in figshare 39 .