Introduction

Diffusion tensor imaging (DTI) based on proton mobility evaluates the water diffusion properties. DTI provides diffusion characteristics that reflect the tissue integrity of the white matter fibers with fractional anisotropy (FA) which means the global anisotropy of the analyzed structures1 and apparent diffusion coefficient (ADC) value. Previous studies have revealed that DTI is a useful diagnostic MR technique for detecting subtle microstructural damage to the spinal cord that appears the absence of apparent abnormalities on conventional T2-weighted MR images2,3,4,5,6,7,8,9.

Cervical spondylotic myelopathy (CSM) is a chronic compressive spinal cord disorder caused by spondylosis or disc degeneration2,10,11,12. This disease affects a relatively large population and can induce irreversible neurological injury through progression of myelopathy. Therefore, accurate evaluation of disease severity and assessment of postoperative outcomes are important for the treatment of patients with CSM. Currently, the diagnosis and assessment of CSM are usually performed using a combination of clinical evaluation and MRI study. However, the basic limitation of conventional MRI is that these directly obtained findings do not correlate accurately with the patient’s symptoms or functional status in general and cannot predict the prognosis after surgical decompression.

Previously, Demir et al.13 reported that ADC values and diffusion tensor measurements showed better sensitivity than T2-weighted imaging for detecting myelopathy in CSM patients. Recently, several reports have suggested that DTI might be useful for evaluating the severity of myelopathy and predicting surgical outcomes in CSM7,14,15,16. However, contrary to DTI of the brain, DTI of the cervical spinal cord has basic technical difficulties, including large motion artifacts caused by swallowing, respiratory movement, and cerebrospinal fluid pulsation; susceptibility artifacts caused by abrupt changes in internal structures antero-posteriorly; and practical difficulty in measuring the exact region-of-interest (ROI) in the spinal cord. For DTI parameter assessment, the ROI must be placed manually, and possible measurement error is a serious concern. For clinical application of DTI in assessing CSM, the examination should have acceptable reliability and reproducibility.

Nevertheless, few studies have reported on DTI reliability in patients with CSM. Therefore, this study assessed the test–retest and inter-observer reliability of cervical spinal cord DTI in CSM patients, as well as the agreement among three ROI measurement methods.

Results

Overall ICC values

ICC values for FA measurements at each spinal cord level among the four observers are shown in Table 1. The overall agreements among the four observers varied according to spinal cord level, and the assessed measurement methods ranged from having poor to having excellent agreement (ICC = 0.374–0.821). Only levels C3/C4 and C4/C5 showed good-to-excellent agreement among observers for the first and second measurements in both the mean and manual ROI methods (ICC = 0.732–0.821).

Table 1 Overall ICCs of FA measurements among the four observers.

Among the methods, sagittal ROI measurements showed relatively lower agreement among observers for the C6/C7 level than almost all spinal cord levels, except for the second measurement of C1/C2. Moreover, lower ICC values, reflecting poor inter-observer agreement, were found at the C2/C3, C5/C6, and C7/T1 levels using the sagittal ROI method (ICC = 0.374–0.432).

Test–retest and inter-observer reliability

ICC values for the four observers from the test–retest reliability assessment are demonstrated in Table 2. Test–retest reliability varied among observers 1 and 2 (ICC = 0.460–0.959); however, observers 3 and 4 showed excellent test–retest reliability at all spinal cord levels for the three measurement methods (ICC = 0.887–0.997), except for observer 4 at the C1/C2 level using the sagittal ROI method (ICC = 0.645).

Table 2 Test–retest reliability of the three measurement methods among the four observers.

Based on the test–retest reliability results, the four observers were divided into two groups: the medical student group (observers 1 and 2) and the radiology resident and neuro-radiologist group (observers 3 and 4). ICC values within these two groups for the three measurement methods are shown in Table 3. Inter-observer agreement between observers 1 and 2 varied widely across the different spinal cord levels and measurement methods (ICC = 0.510–0.954), which were similar to the test–retest reliability results.

Table 3 Inter-observer reliability for the three measurement methods within two groups.

However, despite the excellent test–retest reliability, the inter-observer agreements of the radiology resident and neuro-radiologist group (observers 3 and 4) also varied widely, with fair-to-good agreement (ICC = 0.404–0.747) for almost every spinal segment and all three measurement methods. In particular, there was poor inter-observer agreement at the C2/C3 (ICC = 0.275 and 0.302 for the first and second measurements, respectively), C5/C6 (ICC = 0.222 and 0.404 for the first and second measurements, respectively), and C7/T1 levels (ICC = 0.084 and 0.157 for the first and second measurements, respectively) when using the sagittal ROI method. There was excellent inter-observer agreement only at the C3/C4 (ICC = 0.792 and 0.800 for the first and second measurements, respectively) and C4/C5 levels (ICC = 0.756 and 0.773 for the first and second measurements, respectively) when using the manual ROI method.

The differences between observers 3 and 4 in the calculated mean FA values from C1/C2 through C7/T1 for all subjects are shown in Table 4. Between observers 3 and 4, there were statistically significant differences in mean FA values for all three measurement methods. Among the three methods, the sagittal ROI method yielded a statistically significantly lower value than did the other methods for both observers. There was no statistically significant difference in the mean FA values obtained by the mean versus manual ROI measurements for both observers.

Table 4 Differences in mean FA values from C1/2 through C7/T1 for all subjects between observers 3 and 4.

Discussion

This study is one of the first to evaluate the reliability of DTI in assessing the cervical spinal cord in adult patients with CSM. The aim was to assess the test–retest and inter-observer reliability of FA values at all intervertebral disc levels of the whole cervical spinal cord in patients with CSM. To date, only three studies17,18,19 have reported assessment of the reliability of ROI placement to quantify DTI measurements in the cervical spinal cord. Two studies were performed on pediatric spinal cords, with or without spinal cord injury. Only one study was performed on the cervical spinal cord in a healthy adult population.

Mulcahey et al.18 reported good to strong test–retest reliability for diffusivity values at each level of the spinal cord. Likewise, the reliability of FA values for the mid-C4 level and levels at and below C5–C6 was good. Despite only fair repeatability for FA values at several levels, the data suggested that repeated DTI values can be obtained for children with chronic cervical-level spinal cord injury, with evidence of good to strong reliability for mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD), and fair-to-good reliability for FA.

Barakat et al.17 also reported that inter- and intra-observer agreement between two ROI measurement methods (a freehand ROI and a fixed-size ROI) showed moderate (ICC = 0.5) to strong (ICC = 0.84) agreement in the normal pediatric spinal cord, and that FA values showed the highest variability among DTI parameters (ICC = 0.10–0.87).

Brander et al.19 conducted a study on the reproducibility of ROI measurements using a freehand technique in a healthy adult population. They reported that the intra-observer variation of the measurements for whole-cord FA and for the ADC showed almost perfect agreement, when using both ROI- and tractography-based measurements. There was greater variation in measurements of individual columns. Inter-observer agreement varied from moderate to strong for whole-cord FA and for the ADC.

In our study, repeated ROI measurements revealed wide variations in the inter-observer reliability, which was in contrast to the findings of previous articles. Although the radiology resident and neuro-radiologist group (observers 3 and 4) showed excellent test–retest reliability, inter-observer reliability showed fair-to-good agreement for almost every spinal segment using the three measurement methods. Furthermore, there were statistically significant differences in the calculated mean FA values obtained when these observers used the three ROI measurement methods. This finding has practical implications because DTI data for the analysis of cervical cord abnormalities can include a basic measurement error bias, which may cause a reliability problem in clinical use. If the placement of the ROI includes more CSF or adjacent structures, the FA value will decrease; Fig. 1 clearly demonstrates this phenomenon. Observer 1 placed more single-voxel ROIs than did observer 3. According to the axial T2-weighted TSE scan, the coverage of the total ROI placement exceeded the true outline of the spinal cord and contained more CSF. Consequently, calculated FA values were markedly lower for observer 1 than for observer 3. Furthermore, in patients with CSM, a compressed spinal cord and the low spatial resolution of DTI make it difficult to place an exact ROI that includes only the spinal cord. Recently, several reports have indicated that pre-operative FA values correlated well with symptoms or functional status in patients with CSM and could predict surgical outcomes7,14,15. According to our results, DTI parameters and clinical data analysis should be interpreted with caution.

Figure 1
figure 1

Fractional anisotropy (FA) measurements by observers 1 and 3 at the C5/C6 level using the mean region-of-interest (ROI) method in diffusion tensor images of a 64-year-old man. (a) An axial T2-weighted turbo spin-echo (TSE) scan of the C5/C6 intervertebral disc level is shown. The spinal cord and adjacent cerebrospinal fluid (CSF) space are clearly demonstrated. (b) The first and second FA measurements by observer 1 at the C5/C6 level obtained using the mean ROI method. The calculated mean FA values are 0.424 and 0.400. (c) The first and second FA measurements by observer 3 at the C5/C6 level obtained using the mean ROI method. The calculated mean FA values are 0.635 and 0.635.

According to our data, the sagittal ROI measurement method showed lower test–retest and inter-observer reliability and statistically different mean FA values as compared with the other methods. Thus, it seems that the sagittal ROI method is not appropriate for measurement in a clinical setting. Particularly at the C2/C3, C5/C6, and C7/T1 levels, there was poor inter-observer agreement when using sagittal ROI measurement. This result is similar to that of Barakat et al.17. The upper cervical (C2/C3) and lower cervical (C7/T1) levels were located at the edge of the coil that was used for imaging the patients. In these regions, the signal-to-noise ratio (SNR) decreases and the signal drop is marked. Furthermore, the relatively wide CSF space, as compared to other spinal levels, may lead to inclusion of more CSF when placing the ROI. All these factors can contribute to lower inter-observer agreement when using the sagittal ROI method. The low agreement at the C5/C6 level may have been related to the C5/C6 level being the most commonly affected segment, with central canal compromise, in patients with CSM, while the lowest cervical levels (C4–C7) were most sensitive to cardiac motion. Therefore, obscured anatomical margins and some cardiac-related artifacts may have biased the placement of ROIs.

However, about the resolution against this ROI problems, Yokohama et al.20 reported the more reliable and better visibility DTI method in 3-T MRI, called reduced FOV or so-called zonally oblique multislice (ZOOM) DTI. They concluded that ZOOM DTI provides better visibility with less distortion and high accuracy using a small FOV and a shorter practical scan time compared with conventional DTI. Moreover, using this ZOOM DTI method, Iwasaki et al.21 reported the pre-surgical FA values are affected by "aligend fibers effect", which is compressed fibers show higher FA value and those values are not suitable for prognostic predictors. After all, it is thought that accurate parameter measurement is important for DTI's clinical utility and therefore technical improvement is necessary to clearly distinguish the boundaries between the spinal cord and CSF by reducing image distortion and improving the spatial resolution. A method like ZOOM DTI mentioned above would be a good solution and authors also proposed that to attain further rapid, high resolution DTI sequences, combined ZOOM DTI and recently introduced techniques such as turbo spin-echo (TSE)-DWI and multi-band SENSE are desirable.

There are some limitations to our study. First, the resolution of the DTI in this study is low, and this may have increased the deviation between inter-observers in accessing the reliability. In fact, our institution currently uses DTI protocol by increasing slice thickness and NEX (to 4 mm and 14 respectively), and the reduced FOV technology could also be a solution to increase resolution. If there was a sufficient increase in resolution, it is possible that the reliability of the DTI parameter measurements has increased. In addition, appropriate anatomic reference may have to be provided, but only in the sagittal ROI measurement method used the guidance of the T2-weighted scan. If T2-based (T2WI or T2* weighted image) references were used, it would yield better reproducibility of placement of ROI. Second, only patients with CSM were enrolled. Compared with a normal healthy population, anatomical changes, such as underlying spondylosis or cord compression, may make it difficult to place the ROI exactly on the spinal cord. However, CSM is the most common form of spinal cord dysfunction22 and the reliability of FA measurements in this group of patients has clinical importance. Third, although all observers who performed measurements had a consensus training session for ROI placement before the study, their experience in DTI and related measurements was relatively low. Furthermore, the training using the standardized protocol was insufficient, especially in the unexperienced observer, it was possible that these problems were flawed in interpreting the results of the reliability. The result that test–retest values are high and inter-observer reliability is relatively low itself can be said to mean the fact that each observer has applied different measurement methods depending on the understanding of anatomy and MR images, even though the consensus training was performed prior to the measurement. Therefore, it is believed that the reliability of DTI can be concluded in a true sense if the training using standardized protocol is conducted prior to the study, and when experts with sufficient practical experience evaluate as observers. Fourth, we did not use cardiac gating during the image acquisition, which can diminish flow artifacts from CSF. However, using cardiac gating may increase other motion artifacts caused by respiration or swallowing, because of the lengthened examination time. Fifth, we evaluated only FA values among DTI parameters, and did not investigate the ADC, AD, RD, or MD. Previous studies17,18 demonstrated relatively low and variable reliability in FA values as compared with other diffusion parameters. Thus, further study evaluating the reliability of other DTI parameters is needed to reveal overall reliability. Finally, the ideal reliability study requires a prospective study design using a priori hypothesis and a positive design, high resolution DTI study is required to support our findings.

In conclusion, for use as a diagnostic tool, data obtained by measurements on DTI should have high reliability and reproducibility. Despite excellent test–retest reliability of the ROI measurements, FA values in patients with CSM varied widely in terms of inter-observer reliability in our study. Therefore, DTI parameters should be interpreted with caution when applied clinically. Furthermore, education and practical training in DTI methods are imperative to ensure for reliable assessment for the measurements.

Methods

This retrospective study was approved by the Seoul National University Bundang Hospital institutional review board (IRB No: B-1406-256-102). This study was conducted in accordance with relevant guidelines and regulations/declaration of Helsinki. The research holds out no more than minimal risks to participants and was reviewed through an Expedited Review. So, the requirement for informed consent was waived by the Seoul National University Bundang Hospital institutional review board.

Study subjects

We retrospectively searched the electronic medical record system and Radiology Department database at our institution between July 2013 and December 2013, for cases meeting the following inclusion criteria: (1) a clinical diagnosis of CSM, (2) availability of pre-operative diffusion tensor MRI scans, and (3) the use of surgical decompression. All patients had neurological signs and symptoms with clear evidence of cervical spinal cord compression due to cervical spondylosis on conventional cervical spine MRI. The evaluation of myelopathy was performed using a modified Japanese Orthopedic Association score. We excluded patients with (1) tumor-, trauma-, or infection-related cord compression, (2) prior surgery, (3) coexisting neurologic disorders, such as acute transverse myelitis or multiple sclerosis, and (4) suboptimal image quality due to severe artifacts.

Finally, 34 patients (12 men, 22 women; mean age, 58.7 [range 45–79] years) were enrolled in this study. For surgical treatment, anterior cervical discectomy and fusion was performed in 28 cases, laminoplasty was performed in four cases, and posterolateral fusion was performed in two cases.

DTI protocol

All pre-operative DTI was performed within 2 weeks prior to surgery. All MRI examinations of the cervical spinal cord were performed using a 3-T MRI scanner (Achieva, Philips Medical Systems, Best, The Netherlands). No upgrade or other changes were made to the MRI system software in this study. During the image acquisition process, all subjects were placed in the supine position with 16-channel neurovascular coils applied to the cervical region.

Sensitivity-encoding (SENSE) single-shot echo planar imaging (EPI) was used. Used MR protocols are23:

  1. 1.

    SENSE factor: 4 (for sagittal DTI). excitations: 4

  2. 2.

    b-value: 600 s/mm2

  3. 3.

    diffusion gradient directions: 15, diffusion gradient strength: 40 mT/m

  4. 4.

    slice thickness: 2 mm

  5. 5.

    fold-over direction: anterior–posterior, fat shift direction: posterior

  6. 6.

    TR/TE: 3,400/60 ms, matrix: 124 × 124 mm, FOV: 250 mm, Voxel size: 2 × 2 × 2 mm

  7. 7.

    Scan time: 4 min

After sending all source images of the DTI to a personal computer, diffusion tensor parameters/fiber tracking were evaluated using the fiber assignment by continuous tracking (FACT) algorithm implemented within the DTI task card software (the Extended MR WorkSpace 2.6, Philips Medical Systems)6,24. In the axial b0 image, two slices (C1 and C7 levels) were selected. Circular ROI that included the entire spinal cord was placed and fiber tracking was performed. Only fibers passing through the ROIs were displayed. The thresholds for tracking termination were 0.2 for FA and 30° for the angle between 2 contiguous eigen-vectors.

Image and measurement analysis

A total of four observers (two elective medical university students [observers 1 and 2], one third-year radiology resident [observer 3], and one neuro-radiologist with 3 years of experience in DTI [observer 4]) independently measured FA values, twice, after consensus training. To prevent recall bias, each measurement was performed at an interval of 1 month. After sending all source DTI images to a personal computer, each observer, who was blinded to the clinical condition of each patient, measured the FA value in the cervical spinal cord at the level of each spine segment. For the FA measurements, ROIs were manually drawn on axial and sagittal color tensor maps along the cervical spinal cord at the level of each cervical intervertebral disc. Spine segments were selected for each disc level from C1/2 to C7/T1, with reference to a mid-sagittal T1-weighted image.

Three measurement methods were used for placing the ROIs in this study. (1) In the mean ROI method, for each single voxel inside the spinal cord on the axial image, special attention was paid to select ROIs while avoiding partial volume effects, magnetic susceptibility effects, and motion artifacts. Average FA values for all voxels inside the spinal cord at each spine segment level were calculated (Fig. 2a,b). (2) In the manual ROI method, each observer manually outlined an ROI up to the outer margin of the spinal cord on an axial FA map, using a freehand technique, which represented approximately one voxel, while being cautious to avoid volume-averaging effects with the cerebrospinal fluid (CSF) (Fig. 2c). (3) In the sagittal ROI method, each ROI was placed manually on the sagittal FA map, similar to the second method (manual ROI) (Fig. 2d). In this method, ROI selection for each spinal level was guided by reconstructed sagittal b0 maps, and axial and sagittal turbo spin-echo (TSE) T2-weighted images.

Figure 2
figure 2

Three different fractional anisotropy (FA) measurement methods at the C2/C3 level applied in diffusion tensor images of a 53-year-old woman. (a) The mean region-of-interest (ROI) method using each voxel inside the spinal cord on the axial image, guided by a sagittal T2-weighted turbo spin-echo (TSE) image. A total of eight voxels were placed on the spinal cord. (b) The calculated average FA values for all voxels inside the spinal cord from the mean ROI method range from 0.332 to 0.479. These FA values were averaged per cord level across all subjects. In this patient, the mean FA value is 0.417. Additional apparent diffusion coefficient (ADC) values for each voxel were also automatically calculated. (c) The manual ROI method, using a freehand technique, which represents approximately one voxel. The calculated FA value is 0.417. (d) The sagittal ROI method, using a freehand technique, which represents approximately one voxel. The calculated FA value is 0.400.

One of the authors (musculoskeletal radiologist with 4 years of experience in spinal DTI analysis) conducted the image and measurement analyses. To assess test–retest and inter-observer reliability, the FA values measured by the four observers using these three methods were compared.

Statistical analysis

Statistical analyses were performed by one author. The test–retest- and inter-observer reliability of each FA value obtained by the four observers using three measurement methods were assessed using intraclass correlation coefficients (ICCs) and a two-way random model. Test–retest and inter-observer reliability depends primarily on good training of the observers and good standardization of the task. The ICC value could range from 0 to 1; ICC values of less than 0.40 represented poor agreement, values of 0.40–0.75 represented fair-to-good agreement, and values greater than 0.75 represented excellent agreement.

The differences in the mean FA value, averaged per cord level from C1/C2 through C7/T1 of all study subjects, for all three measurement methods among the observers were assessed using the wilcoxon signed rank test test. Analyses were performed using SPSS (ver. 21.0, SPSS Inc., Chicago, IL, USA) and MedCalc software (version 13.0, MedCalc Software, Mariakerke, Belgium). A P value < 0.05 was considered statistically significant.