Introduction

Valid and reliable outcome measures in clinical trials on spinal cord injuries (SCIs) must be generated to develop effective treatment interventions—a necessity that is particularly true for walking function, which is a principal goal for subjects with SCIs.1 Thus, it is conceivable that many clinical trials will be geared toward walking recovery and require valid outcome measures. Outcome measures that are related to walking function include measures of walking capacity, such as short-distance timed walk, long distance (6-min walk) and the walking index for SCI (WISCI II).

The WISCI was introduced in 20002 and modified in 2001 (WISCI II)3 as a measure of the capacity to walk for use in clinical trials, incorporating the use of walking aids, braces and physical assistance on a 21-point scale. The WISCI was ranked by an international group of SCI clinicians and investigators from most impaired to least impaired, and has demonstrated theoretical construct and face validity. It was subsequently compared with four scales in a clinical population of mixed SCI and spinal cord lesions to validate its retrospective criteria (versus other scales).4

In 2006, the WISCI was used in a multicenter, randomized clinical trial, as assessed by blinded observers, and correlated well with lower extremity motor score, balance, walking speed, 6-min walking distance and locomotor functional independence measure score, validating its prospective criteria.5 Since then, the WISCI II has enjoyed increased popularity and acceptance.6

A recently published systematic search of the literature on WISCI/WISCI II (2013)7 clarified its use and further needs. Although the review concluded that its validity is well established, it summarized the responses to suggestions for additional examination of its reliability, pointing out that the initial assessment of the reliability of the WISCI (2000)2 dealt with agreement among experts from eight countries with regard to the correct ranking of WISCI levels by video images. Further, the review concluded that a more ‘crucial test’ of a measure of capacity requires test–retest assessment of patients following SCI by trained raters. Although Marino et al.8 and Burns et al.9 fulfilled such a test in chronic subjects, only Scivoletto et al.10 have studied this issue in acute subjects in a preliminary report of 19 subjects.

For these reasons, this study was performed to demonstrate the interrater (IRR) and intrarater reliability of the WISCI II at maximum levels in patients with acute SCIs.

In addition, we examined the true difference (smallest real difference, SRD) in acute SCI subjects. The SRD is a measure of test–retest reliability and within-subject variance.8 The SRD is a measure of the noise. Following an intervention, a change in score would need to exceed the SRD to be considered ‘real.’

Patients and methods

Subjects

Study subjects were recruited from the Spinal Unit, IRCCS S. Lucia, Rome, Italy. Candidates must have sustained a traumatic SCI within 3 months (acute lesions per other studies5). Inclusion criteria included a history of traumatic SCI, incomplete motor status (American Spinal Injury Association (ASIA) impairment scale (AIS) C or D) and a motor level of C4–L1 inclusive per the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI).11 Neurological status was confirmed by examination before testing the WISCI II level. Informed written consent was obtained from all patients, and the study was conducted as per the Declaration of Helsinki.

Assessments

The neurological status was assessed by SCI physicians and trained physical therapists as per the ISNCSCI.11 The key upper and lower limb muscles were graded by manual muscle testing on a six-point scale for each limb, and the AIS grade was determined for each subject.

As recommended for reliability studies,12 the WISCI II was assessed by physical therapists who were trained on the use of the WISCI II, and instructed with regard to testing for IRR and intrarater reliability. Two therapists, A and B, tested subjects on 2 different days (within 48/72 h) for the maximum WISCI II levels as per the following protocol. We chose a shorter time than recommended,12 because the patients were in the acute/subacute phase, and therefore changes in strength and function may occur in days rather than weeks.13

Maximum level was defined as the level that was (1) safe during training in therapy compared with a hospital environment, (2) for a 10-m distance, and (3) judged to be safe by the training therapist. Thus, we adopted a more rigorous definition of ‘safe’ than what is applied to a chronic subject. ‘Safe’ was defined as having adequate balance to prevent falling, clear and place the foot flat on the surface, cause minimal lurching, and affect upright posture.8, 9 Attaining ‘safe’ might have required physical assistance, the use of braces and other supports, and appropriate walking aids.

The maximum capacity of WISCI II levels was determined as follows:

  1. 1

    Therapist A determined the maximum WISCI II level for each individual at the time of early mobilization following injury during inpatient hospitalization. The therapist confirmed that the subject could safely ambulate at this level by observation; then, the maximum WISCI II level was recorded by the same therapist.

  2. 2

    Therapist B obtained the same history from subject 1 and tested him on the same day as per the protocol above. Therapist B was blinded to the evaluation by therapist A, which meant that neither therapist was present for or discussed his colleague’s evaluation.

  3. 3

    Therapist B repeated the protocol for subject 1 in 48/72 h, after which therapist A did so. The therapist who last evaluated the subject on the first day evaluated him first on the second day.

Statistical analysis

Statistical analysis was performed using SPSS for Windows, version 12.0 (Chicago, IL, USA). Descriptive statistics were calculated for all variables as median and interquartile range and full range. IRRs and intrarater reliabilities were determined for maximum WISCI II levels using intraclass correlation coefficients (ICCs) and a one-way, random effects model.9

Reproducibility of the WISCI II scale was assessed by calculating the SRD.8, 9, 14 As reported by Marino et al.8 and Burns et al.,9 ‘the SRD is a function of the standard error of measurment (s.e.m.) that assesses the test–retest reproducibility of a measure by calculating the variability of measurements in the same individual.14 The s.e.m. is the square root of the within-participant variance. The SRD was calculated as SRD95=s.e.m. × √2 × 1.96.

Results

Our study group comprised 33 patients (28 males and 5 females), with a median age of 44 years (interquartile range 28 and full range 69); the median time since onset of SCI was 40 days (interquartile range 32 and full range 73). With regard to lesion level, 20 patients had a lesion at the cervical level versus 8 at the thoracic level and 5 at the lumbar level. All patients but one were AIS grade D; and one patient was classified as AIS grade C and had a lesion at the lumbar level (Table 1).

Table 1 Patients’ features

The ICCs for intrarater reliability were 0.999 for the maximum WISCI II score for therapist A and 0.979 for therapist B (Table 2). The IRR reliability for the maximum WISCI II score was 0.996 on day 1 compared with 0.975 on day 2 (Table 2). These values were comparable when subjects with paraplegia and tetraplegia were examined separately (Table 2). Raters differed in maximum WISCI II evaluation for seven subjects (nos. 5, 13, 16, 17, 19, 25 and 29; Table 1).

Table 2 Inter and intarater reliability

The reproducibility of the WISCI II was supported by the SRD (Table 3). The SRD was 0.883 for the maximum WISCI II score, and 1.112 and 1.212 for subjects with tetraplegia and paraplegia, respectively.

Table 3 Reproducibility

Discussion

The aim of this study was to assess the reliability and reproducibility of the highest WISCI II level in patients with acute SCI. Although the WISCI is a relatively simple measure and detailed instructions exist on how to evaluate and progress, the patients along the various levels (http://www.spinalcordcenter.org/research/wisci_guide.pdf), we decided to undertake this study based on two evidences. First, acute SCI patients are different from chronic ones because, in particular incomplete patients, they exhibit a rapid motor recovery as demonstrated in a number of studies.13, 15, 16, 17 Second, acute SCI patients show fluctuations of their clinical status due to factors such as fatigue, psychological distress and so on. They may also be more difficult to evaluate, especially for the issue of safety. As a consequence, in our previous pilot study on the IRR and intrarater reliability of acute subjects,10 4 of 21 (19%) subjects showed a difference in maximum WISCI level in at least one of the four assessments. However, Burns et al.9 reported that in chronic subjects, the assessment of maximum WISCI II level by the two examiners differed in two of 63 (3%) cases, with higher reliability coefficients than those reported in our study. Therefore, according to previous studies18, 19 that state that measurement errors, and thus the reliability of a measure, are not a fixed property but are dependent on the studied population, in the effort of improving the generalizability of the WISCI II, we decided to enlarge our pilot study on acute SCI patients.

The rationale for examining the highest WISCI II level has been presented by several studies of chronic patients.8, 9, 20

Reliability and responsiveness

The reliability of the WISCI was established in its development when a videotape of patients who were walking at each level (40 randomized clips) was circulated to SCI experts. The IRR was 1.00 across 24 individual participants and 8 participating teams.2 However, the assessment of reliability at that stage required consensus on the number and types of aids and assistants with which the person was walking; further, there are claims that further evidence of reliability and responsiveness is needed.15, 20, 21, 22

Nevertheless, reliability studies on the WISCI are available only in chronic SCI patients.8, 9 Recently, Marino et al.8 reported that for chronic SCI patients, the IRR and intrarater reliability was 1.00 for SS WISCI II. The intrarater reliability for maximum WISCI II level was 1.00 and IRR reliability was 0.98. The progression from self-selected to maximum WISCI II level also showed high agreement between and within therapists. In another study of 76 subjects with chronic SCI, Burns et al.9 reported excellent reproducibility of the WISCI II.

Although Burns et al.9 states that ‘in participants with acute SCI, validity and reliability have been demonstrated for both timed walking tests (for example, 10 and 6 min walk test) and categorical scales (for example, WISCI II)9’ with multiple references, this requires clarification. The studies cited for the WISCI only relate to validity, while reliabity only relates to the timed walking.

With regard to acute SCI, Van Hedel et al.23 studied the walking ability of a small cohort of patients (N=22) with acute SCI using several walking measures (timed up and go; 10 and 6 min walk test), but he assessed the IRR and intrarater reliability only for the timed tests. Although he used the WISCI to validate these measures in the same report,15 he did not perform reliability studies of the WISCI.

In this study, we have demonstrated the IRR and intrarater reliability of the maximum WISCI II level in patients with acute SCI, which was excellent (0.975–0.999). Test–retest reliability indicates the level of measurement error, or ‘noise,’ for an outcome measure—that is, how similar the values are when measuring an unchanged parameter on more than one occasion. Increased reliability makes it easier to differentiate real change from noise.

Reproducibility

The reproducibility of an instrument is an index of the precision of single measurements and is a function of its test–retest reliability. Better reproducibility implies greater precision, whereas high variability is associated with poor reproducibility. In this case, a larger difference is needed to detect a real change. SRD is an estimate of the smallest change in a score that can be detected objectively for a client—that is, the amount by which a patient’s score needs to change to ensure that the change is greater than the measurement error. SRD can be used as an indirect measure of the responsiveness of outcome measures. Only one study has assessed the SRD of the WISCI II in chronic SCI—Burns et al.9 reported an SRD for the maximum WISCI II level of 0.597 and concluded that a change of one WISCI II level in a chronic patient can be interpreted as real.

In our study, the significant real difference of 1.147 (tetraplegics) and 1.682 (paraplegics) for the maximum WISCI II level suggests that in acute SCI patients, an increase in the WISCI II must be at least two levels to be considered a true improvement. The difference between our study and the report by Burns’s et al.9 is that in the latter, the assessment of maximum WISCI II level by the two examiners differed in 2 of 63 cases, with a low within-subjects s.d. However, in our study, the evaluations differed in 7 of 33 subjects (7/33 with a variation of at least three levels); thus, the s.d. and SRD were higher than in chronic subjects. There are several explanations for this disparity. Acute patients can be less stable in their day-to-day performance, or they could be more difficult to evaluate, particularly with regard to the issue of safety, which clearly influences the choice of the therapist. Further, in the two cases in which the maximum WISCI II level was higher on the second day, a learning effect could not be excluded. However, it could not be excluded that the sample size and composition may affect the SRD in our series.

Limitations of the study

The interval between tests is a potential limitation in a reliability study, but it must be weighed against stability, because in the acute/subacute phase of SCI, changes in strength and function can occur in days rather than weeks.12, 13 The sample size was not large (N=33), and the proportion of patients at ASIA level D was greater than that at ASIA C; however, the only other study21 of reliability in acute SCI patients was small (N=22), and most subjects were ASIA D.

Secondly, the SRD has been suggested to provide an indication of whether a patient achieved a real improvement beyond measurement noise,24, 25 and to reflect reproducibility and responsiveness of a measure;14 therefore its use has been suggested to calculate the sample size in clinical trials and to evaluate the primary outcome measure.14 However, it should be highlighted that such instrument does not consider the subject’s perspective on what could be considered a worthwhile change, on the costs and the risks as it depends on the psychometric properties of the outcome measure under evaluation.26 As in rehabilitation, the client’s perspective is highly valued, additional studies on the clinical significance of the WISCI II that include an assessment of the subjects’ perceptions of the impact of the improvement are needed.

Conclusions

This study demonstrates that the WISCI II has high IRR and intrarater reliability. Further, the WISCI II has good reproducibility as assessed by the ICCs and SRDs. Thus, the WISCI II is a reliable and useful outcome measure that can be used to detect changes in walking function following acute/subacute SCI.

DATA ARCHIVING

There were no data to deposit.