Introduction

The question of how the spinal cord can be repaired after damage has been pursued for several years. Potential interventions for the recovery of function following spinal cord injury (SCI) likely lie in a combination of pharmacological, surgical and rehabilitation approaches. In order to test the efficacy of any such approaches, reliable, valid and responsive measures of neurological and functional outcome are essential.1 From the patient's perspective, improvements in the ability to function in everyday activities will be the most meaningful determinant of treatment efficacy.

Almost half of all spinal cord injuries are functionally incomplete,2 meaning that there is some sparing of function below the level of the lesion (American Spinal Injury Association Classification C or D).3 The probability of functional recovery is more substantial in incomplete SCI, and the majority of people with incomplete SCI are able to recover walking.2, 4, 5 Thus given that a significant proportion of people who sustain an SCI may be expected to recover some walking, ambulation outcomes will be an important measurement of the efficacy of new medical or rehabilitation interventions.

Walking not only involves the ability to move the legs, but also requires the intricate coordination of neural commands to regulate upright balance and posture and the ability to adapt gait to environmental constraints. Functional ambulation may therefore be defined as ‘the ability to walk, with or without the aid of appropriate assistive devices (such as prostheses, orthoses, canes or walkers), safely and sufficiently to carry out mobility-related activities of daily living.’6 When assessing ambulation, we considered two constructs: capacity and performance.7 Using this framework, we define ambulatory capacity as the highest level of walking function achieved within a standardized environment. Ambulatory performance is defined by what an individual actually achieves in his or her environment. Performance therefore is dependent not only on the ability of an individual to execute a given task, but also on the constraints posed by the surrounding environment. The measurement of such outcomes will be important for assessing the efficacy and impact of clinical interventions for enhancing function in people with SCI. Therefore, the objective of this review was to examine the evidence for the validity, reliability and sensitivity of current outcome measures used to measure ambulation in the SCI population.

Materials and methods

Outcome measures were identified using a keyword search of electronic databases (MEDLINE/PubMed, CINAHL, EMBASE, PsychInfo) from 1980 to 2007. The following keywords were used in the search: spinal cord injury/paraplegia/tetraplegia/quadriplegia, ambulation/gait/walking, measure/scale. References of studies were also hand searched for additional studies. Outcome measures were included if information was published on its psychometric properties (that is, reliability, validity and/or responsiveness) using individuals with SCI. Multidimensional outcome measures were included only if psychometric data were available for specific gait/ambulation subscales.

The following psychometric properties were assessed for each of the measures: reliability, validity and responsiveness. Reliability includes reproducibility and internal consistency. Reproducibility examines the degree to which the score is free from random error and includes test–retest reliability and interobserver reliability. Reliability coefficients have been reported using Pearson's product-moment correlation coefficient (r), Spearman's rank correlation (ρ), kappa-statistic (κ) or intraclass correlation coefficients. Internal consistency assesses the homogeneity of the items and is measured using Cronbach's α (α) or split-half reliability. The minimum standard for reproducibility and internal consistency coefficients for group comparisons is 0.70.8 We considered reliability coefficients 0.75 excellent, 0.40–0.74 moderate and 0.39 poor.9 For internal consistency, α scores 0.80 were considered excellent, 0.70–0.79 considered adequate and 0.69 poor.9

Validity assesses if the instrument actually measures what it intends to measure. Since there are no ‘gold standards’ for measuring ambulation in SCI, we assessed construct validity. Validity was considered poor if correlation coefficients 0.49, moderate if between 0.50 and 0.69 and excellent if 0.70.10 The minimum standard for validity is a correlation coefficient of 0.60.11

Responsiveness determines the ability of a measure to detect clinically important change over time. When possible, effect size (Cohen's d effect size estimate with Hedges adjustment for sample size)12 was calculated from available data to measure responsiveness. When only median and range values were reported, the mean and standard deviation of the data were estimated according to the method of Hozo et al.13 in order to calculate effect size. An effect size of 0.20 is considered small, 0.50 medium and >0.80 large. Floor or ceiling effects, which result when scale items are inappropriately scaled at either extreme, were considered problematic when >20% of subjects received either minimum (floor) or maximum (ceiling) scores.10

Reproducibility and responsiveness are measures of within-subject variability that together provide information about how well an outcome measure can detect change, while taking into account changes due to measurement error or random variability. The standard error of measurement (s.e.m.) indicates how many units of change in the measure are necessary until a change beyond expected error is detectable.14 When data were available, we calculated the s.e.m. by the equation, , where s.d. is the standard deviation of a set of scores and r is the test–retest reliability coefficient of the measurement set.14 In addition, we calculated the smallest real difference (SRD), calculated by 1.96 × s.e.m. × √2, which has been defined as the smallest change that represents a real (clinical) change beyond 0, with 95% confidence.15 It provides an indication of how well an outcome measure would be able to detect a clinically relevant change.15

Results

The search yielded thirteen outcome measures: 10-m walk test (10MWT),16, 17, 18 6-min walk test (6MWT),16, 17 Barthel index (BI),19, 20, 21 Clinical Outcome Variables Scale (COVS),22, 23, 24 Functional Independence Measure (FIM),25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 Functional Standing Test (FST),36 Motor Capacities Scale (MCS),37 Needs Assessment Checklist (NAC),38, 39 Rivermead Mobility Index (RMI), Spinal Cord Injury-Functional Ambulation Index (SCI-FAI),40 Spinal Cord Independence Measure (SCIM-I,41 SCIM-II,42 and SCIM-III43), Timed-Up and Go Test (TUG),17 and the Walking Index for Spinal Cord Injury (WISCI-I44 and WISCI-II45). Five of these measures were omitted for not directly measuring gait/ambulation (FST, MCS) or lacking psychometric data on gait/ambulation subscales in the SCI population (BI, COVS, RMI). One multidimensional measure (NAC) was omitted because its mobility subscale broadly measured transfers, wheelchair skills, as well as ambulation. The remaining seven measures were divided into two categories: timed measures of ambulation (10MWT, 6MWT, TUG and the Temporal/distance Score of the SCI-FAI) and categorical measures of ambulation (FIM, SCIM, WISCI (I, II) and the Gait Score and Assistive Device component of the SCI-FAI). Timed measures reflect the ability to transport the body from one place to another in a timely manner. Two of the categorical measures (FIM, SCIM) are multidimensional scales of function. The psychometric properties of their ambulation-related subscales (FIM locomotor, FIML; SCIM mobility (indoor/outdoor), SCIMIOMob) will be reviewed here.

Instrument properties

Table 1 describes the properties (number of items, scoring method) of each instrument as well as the ambulatory status of the subjects who were tested on these measures.

Table 1 Instrument properties

Timed measures of ambulation

Three of the measures (10MWT, 6MWT, SCI-FAI) include timed measures of overground gait speed or distance, while one measure (TUG) measures the time required to rise from a chair, walk 3 m and return to the chair. Note that only subjects who are able to walk (for example, take at least eight steps40) were used for the psychometric testing of these measures (Table 1).

The 10MWT has been widely used as a measure of gait training studies for people with incomplete SCI. There have been several variations to this measure. Some investigators calculate the time required to walk over the full 10-m walkway,47 while others account for acceleration and deceleration and measure the time required to walk over the middle 6 m of a 10-m walkway.48 van Hedel et al.16, 18 have tested the responsiveness of the 10MWT whereby a ‘flying start’ is permitted (the patient walks 14 m and the time required to cover the middle 10 m is measured).

The 6MWT measures the distance a patient is able to walk over 6 min (including rests as needed).49 It was originally developed as test of aerobic capacity in patients with cardiopulmonary disease.50

The SCI-FAI consists of three domains: a Gait Score indicating the quality of gait, an Assistive Devices score indicating the use of assistive devices, and a Temporal/Distance score. The first two domains are therapist rated, while the Temporal/Distance score consists of a timed walking test (distance walked in 2 min) and a self-report ambulation classification score.40

The TUG is a timed walking test designed to measure mobility and balance. It was originally developed as a clinical measure of balance in elderly individuals.51 The individual is instructed to stand up from an arm chair, walk 3 m, return to the chair and sit down at their preferred walking speed. Assistive devices can be used.

Categorical measures of ambulation

Four of the measures describe ambulation with respect to the extent of external support required, in addition to certain domains of the SCI-FAI. Two of these measures are devoted entirely to ambulation (SCI-FAI, WISCI), while the others are subscales of larger multidimensional measures (FIML, SCIMIOMob). Note that the categorical measures can capture both nonwalkers as well as walkers or wheelchair users (Table 1).

In addition to the timed domains described above, the SCI-FAI also includes domains that describe the quality of gait (Gait Score) as well as the use of assistive devices. The Gait Score domain was developed in collaboration with physical therapists that helped to identify and rank its six parameters: weight shift, step width, step rhythm, step height, foot contact and step length.40

In the WISCI, walking is evaluated by determining the patient's dependence on physical assistance, braces or walking aids to cover 10 m. The original WISCI consisted of 19 levels ranging from ‘ambulates in parallel bars, with braces and physical assistance of two persons, less than 10 m’ (Level 1) to ‘ambulates with no devices, no braces and no physical assistance, 10 m (Level 19).44 The WISCI has since been modified (WISCI-II) to include two additional scale items for a total of 21 levels: Level 0 to indicate inability to stand or walk with assistance and a new Level 18 to indicate no devices, use of braces and no assistance.45

The FIM is a multidimensional scale that assesses the burden of care and functional impairment across a range of domains.52 The motor subscale includes 2 locomotor-related items (FIML): walking or wheelchair propulsion and stair climbing. Each item is scored on a seven-point scale ranging from 1 (total dependence/maximum assistance) to 7 (total independence). The FIML does not consider the use of devices or braces to enable independence. The tool is completed by trained health professionals who observe patient performance.

The SCIM is a new disability scale developed by Catz and colleagues,41 that specifically addresses patients with spinal cord lesions in order to describe their ability to accomplish activities of daily living. It has since undergone two revisions,42, 43 the most recent one resulting in the SCIM-III.43 The Mobility (indoors/outdoors) subscale of the SCIM (SCIM-IIIIOMob) consists of six items (mobility indoors, mobility for moderate distances (10–100 m), mobility outdoors (>100 m), stair management (up and down three steps), transfers: wheelchair–car, transfers: ground–wheelchair) that are each scored on a 2- to 9-level categorical scale. Higher scores reflect a higher level of independence.

Note that the FIML and SCIM-IIIIOMob are not pure ambulation measures since wheelchair use is one of the scoring options. The FIML requires one to indicate whether a wheelchair or walking was used as the mode of locomotion. The SCIMIOMob is scored along a continuum, which extends from wheelchair use to walking with aids to walking without aids. These measures, therefore, may be more applicable to a broader range of individuals with SCI.

Instrument reliability

Table 2 data supporting the reliability of each of the instruments. Most of the timed measures of ambulation showed excellent test–retest and interobserver reliability, with the 10MWT, 6MWT and TUG having the strongest scores. The reliability of most of the measures that assess ambulatory dependence was relatively poorer, with only the WISCI and SCIM-IIIIOMob providing superior reliability.

Table 2 Reliability

The s.e.m. values show that the amount of change necessary to detect differences beyond expected error were 0.05 m s −1 for gait speed over 10 m, 16.5 m for the distance covered over 6 min, 3.9 s for the TUG, 0.7 points on the Gait Score of the SCI-FAI and 1.6 points on the FIML. The SRD tells us the minimum difference required for each of the outcome measures to detect real (clinical) change. This means that a change of 0.13 m s−1 is required for the 10MWT to detect real change, 45.8 m for the 6MWT, 10.8 s for the TUG, 1.9 points for the SCI-FAI Gait Score and 4.4 points on the FIML in order for real (clinical) changes to be detected.

Instrument validity

Table 3 provides details on the available data supporting the validity of each of the instruments. Since there is no ‘gold standard’ of an ambulation outcome measure for SCI, construct validity was assessed. Construct validity of the timed measures of ambulation (10MWT, 6MWT, TUG) between each other and with the WISCI-II is very strong.17 Rasch analysis showed that the SCIM-IIIIOMob has excellent construct validity.43 The SCIM-IIOMob also correlated strongly with the WISCI-II (ρ=0.97, 284 patients).46 However, some limitations in the SCIM-IIOMob's validity for assessing ambulation is suggested by the finding that individuals with a WISCI-II score of 13 (ambulates with walker, no braces, no physical assistance) had no comparable SCIM-IIOMob score.46 The validity of the FIML appears poor, according to misfits of its items with the Rasch model.30, 31 Construct validity of the FIML with other measures of neural impairment ranged from poor (for example, admission ASIA motor score) to excellent (discharge ASIA motor score).30

Table 3 Validity

Instrument responsiveness and floor/ceiling effects

Available data on the responsiveness of each measure is presented in Table 4. The timed measures of ambulation (10MWT, 6MWT) were able to detect changes in ambulation between 1 and 3 months and 3 and 6 months postinjury in patients who were able to stand or walk within the first 3 months after injury.16 Among the categorical measures of ambulatory dependence, there was also a large effect size in WISCI-II scores in changes between 1 and 3 months postinjury.16

Table 4 Responsiveness

The measures of ambulatory dependence exhibit floor (SCIM-IIIIOMob, WISCI-II) and ceiling (FIML, WISCI-II) effects. In particular, the WISCI-II demonstrated severe ceiling among ambulatory SCI individuals55 and severe floor effects among a representative group of SCI patients upon discharge from rehabilitation.46 Floor/ceiling effects have not been evaluated for the timed walking tests or the SCI-FAI (Table 5).

Table 5 Floor/ceiling effects

Discussion

Ongoing efforts in basic science and applied clinical research promise to bring new strategies for enhancing functional recovery following SCI. Reliable, valid and responsive outcome measures will be necessary for accurately assessing the efficacy of any new intervention.1 In this review, we evaluated the psychometric properties of available measures for assessing functional ambulation in the SCI population.

The timed measures of ambulation (10MWT, 6MWT, TUG) consistently fared the best in terms of test–retest and interobserver reliability, construct validity and responsiveness to change (particularly over the first 3 months postinjury) in a subgroup of ambulatory SCI patients. Of the categorical measures, the reliability of the SCI-FAI, SCIMIOMob and WISCI was moderate to excellent while the FIML was the worst, with poor to moderate reliability and poor internal consistency. The s.e.m. provides valuable insight into how much change is necessary to detect differences beyond random measurement error and the SRD indicates the minimum difference between two scores that represents a real change beyond0.15 Beckerman et al.15 emphasizes that there must be a distinction between the SRD and what clinicians define as ‘clinically relevant’. The SRD is a psychometric property of an instrument that indicates how well it can detect a given size of change (that is, the 10MWT can detect a change of at least 0.13 m s−1). For an instrument to be sensitive to a clinically relevant level of change, its SRD should be less than what is deemed clinically relevant.15 We recommend that future work focus on establishing the relationship of clinically deemed relevant differences with the SRD of these measures in order to improve our ability to accurately measure functional change in SCI.

Construct validity of these scales with respect to other measures of ambulation or neurological impairment ranged from poor (for example, FIML Rasch analysis30, 31) to excellent (SCIM-IIIIOMob43). The responsiveness of the WISCI-II to change over the first 3 months postinjury was particularly strong in a subgroup of ambulatory SCI patients;16 effect sizes for other comparisons and other scales were somewhat more modest. Many of the categorical measures of ambulation also exhibited floor or ceiling effects, especially the FIML upon discharge (ceiling), the SCIM-IIIIOMob upon admission to rehabilitation (floor) and the WISCI-II (floor and ceiling) upon discharge from rehabilitation. One of the advantages of the timed walking tests is that there is no conceivable ceiling effect. However, these tests are valid only for those individuals who are able to walk, whereas a more representative SCI population can be evaluated with the categorical tests. In addition to their utility in tracking both walkers and nonwalkers, categorical tests can also capture the transition from nonambulatory to ambulatory status.

Although the FIM is often considered the gold standard for assessing activities of daily living, it is not SCI specific and the results of this review reveal its limited reliability, construct validity, responsiveness to different lesion levels and ceiling effects in people with paraplegia. The poor reliability of the FIML may be attributed to the fact that this subscale groups walking and wheelchair mobility together in one item when they are actually two separate modes of locomotion. One recommendation to improve internal consistency of the FIML is to split these items.26 Rasch analysis has also suggested that the psychometric properties of the FIM could be improved by reducing the seven-point scale rating to a four- or five-point scale.31

On the other hand, the SCIM, which was developed specifically for the SCI population,41 shows promise in becoming the gold standard for the comprehensive assessment of basic function in SCI.56 The SCIM-IIIIOMob has been shown to have excellent interobserver reliability and construct validity43 and may be a useful indicator of functional ambulation as part of a larger multidimensional assessment of function. Further research should determine whether the revised version of the SCIM (SCIM-III) has improved precision in capturing all levels of ambulatory ability.46

The WISCI-II, which was also developed specifically for the SCI population, is solely used for the assessment of ambulatory function rather than overground locomotion in general (that is, where both wheelchair use and walking are options).46 It therefore provides a more comprehensive consideration of the use of braces and assistive devices to achieve overground ambulation not found in the other categorical measures. However, the WISCI exhibits ceiling effects, which could limit its use in assessing individuals with only minor impairments. The WISCI-II also does not consider gait speed or energy consumption and does not provide any indication of endurance since the distance covered is only 10 m. It has been suggested that the WISCI-II would benefit from additional information on walking speed to improve responsiveness and to decrease its ceiling effect.17, 46, 55

One suggestion has been to use a combination of the WISCI-II and a timed test (for example, 10MWT) to assess functional ambulation in individuals with SCI.1, 55 Walking velocity measured by either the 6MWT or the 10MWT is comparable in people with incomplete SCI who are able to complete both tests, although care must be taken in exactly how the tests are performed (for example, whether a flying start is permitted for the 10MWT or the dimensions of the track to be used for the 6MWT).16, 18 Due to its shorter and easier implementation, the 10MWT has been recommended as the preferred timed ambulation test in people with SCI.16

Despite the appeal of using these quick and simple measures of functional ambulation (that is, WISCI-II and 10MWT), important information about the quality of walking may be missed. Only one of the ambulatory outcome measures we examined includes an assessment of the quality of the gait pattern (SCI-FAI). Further work is required to elucidate the relationship between common gait deviations with functional outcomes (for example, gait speed and dependence on ambulatory aids). In addition, the usefulness of such measures is obviously limited to only those individuals who are able to take at least some steps. Therefore, care must be taken in choosing appropriate outcome measures for specific subgroups of the SCI population. Categorical measures such as the SCIM-III may be valuable for capturing a broader range of locomotor abilities among SCI individuals, while scales such as the WISCI-II or the timed tests can provide a more precise and specific measure of ambulation.

An important safety aspect of ambulation is the ability to maintain upright balance while walking. None of the measures we reviewed directly measure balance. The Berg Balance Scale is a commonly used clinical measure of static and dynamic balance in the elderly, stroke and Parkinson's populations,57, 58 but has yet to be validated for SCI. There are otherwise limited tools available for assessing balance capacity during walking and none to our knowledge would be suitable for clinical settings. The development and validation of such a dynamic balance-walking tool would be useful as an adjunct to ambulation outcome measures in the SCI population.

Most of the measures we reviewed assess level gait, ranging from simple timed tests such as the 10MWT or 6MWT to the SCI-FAI, which describes specific qualities of leg movement during gait. Although level gait may be captured by a combination of the 10MWT and WISCI-II, these measures lack consideration of ‘real-life’ environmental constraints (mobility).59 The only aspect of mobility, beyond level gait, covered by these measures is stair-climbing ability (rated in the FIML and SCIMIOMob). Environmental surveys to help identify the various challenges faced in everyday walking situations are only beginning to emerge.60, 61 Some common environmental challenges, such as stair-climbing and obstacles, have already been chosen for ambulatory scales for the stroke population. Measures, such as the Emory Functional Ambulation Profile,62 may provide a basis for developing an SCI-specific scale of functional ambulation. Measures that go beyond walking in a straight line may also prove useful in functional ambulation assessments in SCI. The L-test,63 which requires different degrees of turning toward both sides of the body, is one such measure whose value should be assessed for the SCI population.

The International Classification of Functioning (ICF) classifies ‘capacity’ as the highest level of functioning that a person may reach in a given domain and is generally done in a ‘standardized’ environment.7 The measures described in this review fall within the construct of capacity as they are typically evaluated in laboratory or hospital settings and participants are trying to the best of their ability to obtain a better score. In contrast, the ICF classifies ‘performance’ as what an individual does in his or her current environment.7 The measurement of such outcomes will be important for assessing the efficacy and impact of clinical interventions for enhancing function in people with SCI. Direct measurement of such parameters are being implemented by some researchers with the emergence of improved technology for monitoring daily ambulation (for example, StepWatch3 Activity Monitor)64 or remote tracking systems60 that can monitor the quantity and quality of ambulation outside the standard, controlled environment of laboratories and clinics. Recently, a self-selected WISCI-II score corresponding to the level normally used to walk in the home or community (performance measure) was compared to the maximal WISCI-II achieved in a controlled setting (capacity measure) in a group of ambulatory, chronic SCI subjects.55 It was found that subjects walked faster and with less energy expenditure at the self-selected WISCI-II level in the home or community. This highlights the importance of how changes in ambulation should be assessed in clinical trials (that is, whether the usual or minimal level of assistive devices should be used).55 The reliability of the SCIM-II when assessed by interview was also recently examined and found to be comparable with assessment by observation in a hospital setting.65 Further research to validate these and other measures using interview or self-report in the home or community are encouraged to provide better performance measures of ambulation, which will provide complementary information to established capacity measures (for example, 10MWT).

Future work should also be directed toward determining the contributing factors that enable independent community walking. For example, it has been suggested that self-paced walking speeds of 0.4 and 0.8 m s−1 are the minimum criteria for limited and unlimited community ambulation in people with stroke.53 The ability to manage curbs is also considered a critical task for independent community walking.53 Individuals who achieve this level of walking also tend to score better on a quality of life scale.66 Similar criteria have yet to be determined for the SCI population.