Introduction

The International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) is the standard method for the evaluation of neurological impairment following spinal cord injury (SCI).1 The sensory and motor exams are both used to classify the neurological level, motor level and sensory level. The neurological level is the most caudal segment of the cord with normal motor and sensory function bilaterally, whereas the motor level and sensory level are the most caudal segments of the cord where there is normal bilateral motor and sensory function, respectively. The ISNCSCI has undergone several revisions since the first edition in 1982, predominantly to improve reliability in regards to both examination and classification.1, 2, 3, 4, 5 With the most recent revision6 clarity on terminology and testing techniques has been provided.

Although a number of studies have examined the reliability of both the examination and the classification of the ISNCSCI, most evaluated the reliability of summed scores,5, 7, 8, 9 providing little insight into the repeatability of scores at individual myotomes and dermatomes or which dermatomes and myotomes may be more prone to reliability issues. In addition to summing motor scores, Savic et al.5 analyzed individual myotomes in 22 patients with SCI and found substantial to almost perfect agreement for all the muscles. To date, only two studies focused their findings of the ISNCSCI on individual myotomes and dermatomes. Jonsson et al.2 evaluated the inter-rater reliability of the ISNCSCI (1992 version) by assessing the degree of agreement between sensory and motor scores at individual myotomes and dermatomes in adults with incomplete SCI. Mulcahey et al.10 looked at intra-rater agreement of repeated motor and sensory scores at individual myotomes and dermatomes in the pediatric population with complete SCI.

Purpose

The aim of this study was to evaluate intra- and inter-rater reliability of the ISNCSCI motor and sensory scores at individual myotomes and dermatomes, respectively, in the young persons with SCI. Based on our previous work,10 we hypothesized that, regardless of the American Spinal Injury Association Impairment Scale designation, there would be moderate-to-high agreement for motor scores at individual levels for both inter- and intra-rater reliability. Conversely, for incomplete injuries, we expected agreement of sensory scores at individual levels to be moderate-to-high for intra-rater reliability and only moderate for inter-rater reliability.

Materials and methods

This was a prospective repeated measures multicenter study completed in adolescents and youth.

Subjects or samples

The sample consisted of 189 subjects, 97 with complete SCI and 92 with incomplete SCI (Table 1) between 6 and 21 years of age (average=14.5 years). The average time between injury and exam was 4.3 years (range=0.5–20 years). Each of the participants had a minimum of two neurological exams performed by one examiner, but a maximum of four neurological examinations (two exams each performed by two different examines) 1–4 days apart. To minimize variation in repeated test scores resulting from actual neurological changes, there were no subjects with newly acquired (<3 months) injuries and no evidence of concomitant brain injury that would interfere with cognitive ability to participate in the examination.

Table 1 Characteristics of sample with complete (n=97) and incomplete (n=92) SCI

This was a multicenter study, involving Philadelphia (n=121) and Chicago (n=68) Shriners Hospitals for Children, and was approved and reviewed by the Institutional Review Boards at participating sites. Written informed consent was obtained from parents or legal guardians of all subjects under the age of 18 years old, and participants between ages of 7 and 18 completed informed assents. We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during the course of this research. Although this study and the articles by Mulcahey et al.,11 Vogel et al.,12 Samdani et al.13 and Chafetz et al.14 reported on distinct types of reliability outcomes and used different stratifications and statistical methods for analyses, the subjects were drawn from the same sample.

Procedure

Repeated measures of the standardized ISNCSCI motor and sensory exams were administered by multiple testers to study intra- and inter-rater reliability. In total, seven raters who were trained in the examination techniques of the ISNCSCI15 participated in the study; four raters had more than 3 years of experience conducting the ISNCSCI, and the remaining three had <1 year of experience. Before the study, reliability among all raters was determined for examination. Each subject had up to four repeated examinations performed by two raters (the two raters, each performed two examinations on the same subject, provide the platform for analysis of within and between rater reliability). Raters were not randomized. The sensory examinations were performed according to the standard ISNCSCI methods15 unless the subject was inconsistent at a spinal level. In this case, 8 out of 10 trials had to be identified correctly for sharp/dull discrimination (pin prick (PP)) and light touch (LT) appreciation to receive a score of 2. This stringent sensory testing protocol was employed to ensure accuracy of the data. Scoring was performed by the trained raters, and data were entered into a secure database by two independent assistants blinded to the study.

Data analysis

All data were ranked transformed before analysis. The intraclass correlation coefficient (ICC) as a measure of reliability was calculated separately for complete and incomplete SCI groups in addition to intra- and inter-rater reliability based on the repeated measures design using the components of variance model (2, 1 for inter-rater ICC and 3, 1 for intra-rater ICC) to estimate parameters of interest for ICC calculations (that is, inter- and intra-rater).16 Also, 95% confidence intervals (CIs) were calculated for all ICC values.

For reliability studies, sample size consideration is not based on statistical methodology but rather on the desired precision of the reliability estimates. The sample in this study represents a convenience sample of individuals willing to undergo two to four neurological exams. Although this study contains 189 subjects, there was not an equal distribution of subjects with an SCI at each spinal level; therefore, sample size at each level was too small to analyze intra- and inter-rater reliability at each dermatome and myotome when subjects were separated based on their level of injury. Therefore, all subjects with complete SCI were analyzed together as were all subjects with incomplete SCI, regardless of their neurological level, motor level or sensory level. All spinal segments, including those scored absent or normal, were tested and used for data analysis. Both inter- and inter-rater reliability were analyzed for both groups.

Results

The results are shown in Tables 2, 3, 4, 5, presenting ICC values with 95% CIs for each study parameter. ICC values >0.90 reflect high agreement, values from 0.75 to 0.90 reflect moderate agreement and values <0.75 are reflective of poor agreement.7, 9 The 95% CI provides an indication of precision of the coefficients; a narrow CI reflects good precision (10–20% of the estimated true value) and a wide CI reflects weak precision. Myotomes and dermatomes with no values indicate that the ICC could not be calculated either because the variation between subjects was not significant or there was inadequate number of subjects.

Table 2 ICC values indicating intra-rater reliability for myotomes (M) and dermatomes (PP=test of discrimination and LT=test for light touch) in subjects with complete SCI
Table 3 ICC values indicating intra-rater reliability for myotomes (M) and dermatomes (PP=test of discrimination and LT=test for light touch) in subjects with incomplete SCI
Table 4 ICC values indicating inter-rater reliability for myotomes (M) and dermatomes (PP=test of discrimination and LT=test for light touch) in subjects with complete SCI
Table 5 ICC values indicating inter-rater reliability for myotomes (M) and dermatomes (PP=test of discrimination and LT=test for light touch) in subjects with incomplete SCI

Intra-rater reliability

Myotomes

In general, there was moderate-to-high agreement in myotomes. In the subjects with complete SCI, there was high agreement in 60% (12/20) of myotomes with 25% (5/20) having moderate agreement and the remaining 15% (3/20) not able to be calculated; ICC values ranged from 0.78 to 1.0 (Table 2). Out of 20 myotomes in the incomplete SCI group, 75% (15/20) had high agreement with the remaining 25% of myotomes (5/20) having moderate agreement. ICC values ranged 0.84–0.97 (Table 3).

Dermatomes

The PP with the complete SCI group demonstrated ICC ranges from 0.56 to 1 with 21.5% (12/56) of dermatomes having high agreement and 41% (23/56) resulting in moderate agreement (Table 2). For LT, the majority of dermatomes, 57% (32/56), showed moderate agreement with ICC values that ranged 0.45–1.0. As shown in Table 3, subjects with incomplete SCI have ICC values that range from 0.46 to 0.85 for PP, with 64% (36/56) of dermatomes having poor agreement. LT values ranged from 0.38 to 0.83, with 52% (29/56) showing moderate agreement and 48% (27/56) having poor agreement.

Inter-rater reliability

Myotomes

The agreement of myotomes in subjects with complete SCI was high in 65% (13/20) of myotomes with moderate agreement in 20% (4/20) of myotomes. The ICC values ranged from 0.78 to 1.0, and 15% (3/20) of myotomes were unable to be calculated (Table 4). Table 5 provides ICC values that range from 0.87 to 0.98 in subjects with incomplete SCI, with 85% (17/20) of myotomes having high and 15% (3/20) of myotomes having moderate agreement.

Dermatomes

The agreement for PP scores in subjects with complete SCI was high in 21.5% (12/56) of dermatomes, with 39% (22/56) and 18% (10/56) having moderate to poor agreement, respectively. ICC values were unable to be calculated for 21.5% (12/56) of dermatomes, and ICC values ranged 0.56–1.0 (Table 4). This population also had 14% (8/56) high agreement for LT scores in addition to a 46% (26/56) and 27% moderate to poor agreement, respectively. ICC values ranged from 0.43 to 0.94, and 13% (7/56) of dermatomes were unable to be calculated. Table 5 illustrates the agreement of PP scores to be moderate in 39% (22/56) of subjects with incomplete SCI, with the remaining 61% (34/56) having poor agreement. On the other hand, the majority of dermatomes, 61% (34/56), had moderate agreement for LT scores, with 39% (22/56) having poor agreement for this test. ICC values range from 0.43 to 0.85.

Discussion

The purpose of this study was to examine inter- and intra-rater reliability agreement of the ISNCSCI exam at every dermatome and myotome in youth and adolescents with complete and incomplete SCI. There were no dermatomes or myotomes that were statistically different between inter- and intra-rater reliability in subjects with incomplete SCI and only one myotome (L3-R), and one dermatome each for PP (T12-R) and LT (L5-R) showed differences for complete SCI subjects.

As hypothesized, the agreement of myotomes in both complete and incomplete subjects (inter- and intra-rater reliability) was moderate-to-high, except for those myotomes where an ICC value could not be calculated because of a lack of variability between/within subjects. An ICC of 1 indicates complete agreement; this occurred only in subjects with complete SCI. These findings may have occurred because those muscles that were scored as a 5 (normal) and 0 (paralyzed) were also included in the results; previous reports10, 17 have shown that agreement of repeated testing of unimpaired and completely paralyzed muscles is usually perfect, and for this reason, previous studies have excluded them from analysis.5 We choose not to exclude myotomes and dermatomes that were unimpaired and completely paralyzed because this would lead to a variable number of myotomes and dermatomes at each level, potentially, leading to skewed data. In addition, when conducting a reliability study, it is important to test all scores, regardless of where they fall on the testing scale, to ensure adequate results.

As anticipated, the results of the sensory data are less straightforward. For both inter- and intra-rater reliability, there was no dermatomes with high agreement for sensory testing in subjects with incomplete SCI. There were more dermatomes with poor agreement (61% and 64%) for PP scoring when compared with LT scoring (39% and 48%) in both inter- and intra-rater reliability analyses, respectively. These findings are similar to those of Jonsson et al.2 who found acceptable agreement in 49 out of 92 dermatomes tested for LT compared with only 26 out of 96 dermatomes tested for PP in adults with SCI. Interestingly, this is not the case in subjects with complete SCI. There was higher agreement for PP and lower agreement for LT.

Overall subjects with incomplete SCI had more dermatomes and myotomes with lower ICC values for both sensory (LT and PP) and muscle strength tests when compared with subjects with complete SCI. These differences were statistically significant in 80% of myotomes, and 78.5% and 77% of dermatomes for PP and LT, respectively (data not shown). These findings are not surprising and support our original hypotheses. Cohen et al.3 found that classification of incomplete injuries is ‘problematical in many areas’ in contrast to classification of complete injuries.

Past studies18, 19, 20 have found natural recovery of motor and sensory function to occur within the first year following an SCI. However, the extent of disagreement on repeated sensory (LT and PP) scores reported in this study is not likely because of neurological improvements, given the short time interval of 2–4 days between all the ISNCSCI testing session and the chronicity of the injuries. Thus, the low ICC values must be attributed to other sources of variation, such as the subject, testing environment, the test itself or a combination of all three.

The current study examined the reliability of the ISNCSCI in children and youths with both complete and incomplete chronic SCI at individual myotomes and dermatomes. Past studies5, 7, 8, 9 have used the overall motor and sensory scores to assess the reliability of the ISNCSCI; however, the summed score may be the same or different based on what is happening at the individual level. Therefore, summed motor or sensory scores are not an optimal method to detect motor and sensory changes that occur due to recovery or treatment nor are they optimal for detecting progression or remediation of symptoms. Although lower agreement values were expected in subjects with incomplete SCI, the degree of poor agreement on the sensory scores was unanticipated, particularly in light of the strict ‘8 out of 10’ criteria set forth in the ISNCSCI manual.15 This raises some concern about the results of the ISNCSCI sensory examination in youths with incomplete SCI. Clinical trials that define improvements in sensory and/or motor function by changes in scores at individual spinal levels may not be indicative of the actual changes because of our finding indicating unacceptable ranges of agreement. Perhaps, a more robust examination is needed to better define the location and severity of the injury in subjects classified with an incomplete injury by the ISNCSCI. Currently, we are establishing neuroimaging criteria based on diffusion tensor imaging for evaluating the location and severity of SCI in children and youth.21 This technique shows promise in quantifying viable neural tissue within the injured spinal cord in SCI. Also, the poor-to-moderate agreement at the S4/5 dermatomes in subjects with incomplete injury is of concern. Samdani et al.13 reported that 40% of subjects who underwent the ISNCSCI examinations had no S4/5 sensation but did respond positively to deep anal pressure: concluding the importance of testing anal sensation to determine the American Spinal Injury Association Impairment Scale classification. However, studies22, 23, 24 have indicated the possibility that anal pressure is perceived by patients through an alternative pathway, which is not indicative of spinal cord integrity. With the validity of the test for anal pressure in question and the extent of poor reliability at the S4/5 dermatomes, misclassification of completeness is a modest possibility. A more accurate way of determining SCI severity is needed. Wietek et al.24 used functional magnetic resonance imagining to study cortical activity during anorectal stimulation and showed cortical activation in areas similar to those found in healthy volunteers with less extensive activation. Therefore, brain imaging techniques may help to identify the functional pathways that might not be identified with conventional SCI testing.

Study limitations

There are limitations to this study. For example, even though we standardized the testing techniques, youths participating in the study had varying degrees of experience with the examination. Thus, youths differed in their knowledge about the examination, and we do not know the effect of this difference on the examination results. Second, as a way to ensure sufficient number of subjects for analysis, we grouped subjects based on their severity of injury (complete vs incomplete) and not by age or level of injury. Based on our previous work, after 6 years of age, there is no indication that age introduces variability into the examination,11 and thus, we did not feel groupings by age were appropriate. Although the strongest design would be to group subjects based on neurological level or motor level, even with 187 subjects, there would be insufficient numbers per level for analysis.11

Conclusion

Inter- and Intra-rater agreement was moderate-to-high for myotome testing in both complete and incomplete SCI subjects. The agreement of sensory testing (PP and LT) was worse than motor scores with no high agreement for any dermatome in incomplete subjects. Complete subject agreement was slightly better, however, there were still <25% high agreements for either sensory modality. These results suggest that caution should be used when determining if a subject has made true improvements in the ISNCSCI overall scores (total motor and total sensory) or if this change was simply due to a degree of variation at the individual dermatome and myotome level.

Data archiving

There were no data to deposit.