Introduction

The scope of this paper is to review the evolution of outcome measures used to document neurological and functional improvement of acute interventions for spinal cord injury (SCI) used in clinical trials. It will first encompass several important historical contributions of measures that determine the severity of neurological injury after spinal cord injury, such as the manual muscle test, sensory dermatomes and Frankel Grades. These early tests served as building blocks for the development of the International Standards for Neurological and Functional Classification of SCI (1990), which have become the current ‘gold standard’1 applied to various phases of a trial. Second, we need to appreciate the linkage of the severity of SCI to the domains of function, and the evolution of impairment, disability and handicap of 19802 to the current classification of function,3 which encompasses body function/structure, activity and participation. Third, we will examine how these outcome measures have evolved in clinical trials over the past two decades and have currently emerged with linkage of body structure to capacity and performance domains. Finally, we will need to project future evolution of measures that connect each domain of recovery, from the severity of injury to the fullest participation in society of the person with SCI.

Historical development

The International Standards for Neurological Classification of SCI represents a composite of various measures to determine the severity of neurological impairment, which have evolved over the past nine decades and have been reviewed in detail on the occasion of their approval by the International Spinal Cord Society (ISCOS) in 1992.4 They include the motor score, based on the manual muscle test of 10 key muscles on each side of the body, and the sensory score for pin prick and the light touch summed from 28 dermatomes for each side. The key muscle and sensory points that are used to determine the neurological levels, together with the American Spinal Injury Association (ASIA) Impairment Scale (AIS) modified from the Frankel Grades, define the severity of the injury based on the extent of damage to the spinal cord and the completeness of injury.

The manual muscle test, first developed by Lovett in 19125 and refined by the Medical Research Council in 1943,6 and sensory dermatomes developed initially by Head and Campbell7 and later by Foerster in more detail,8 represent the oldest measures of neurological impairment before their incorporation, modification and application into the ASIA Standards for Neurological Classification of SCI introduced in 1982.9 The ASIA Standards, however, were in part a response to several publications in 1969. Michaelis10 made a plea for a more uniform classification of level, as reference to cervical five (C5) level could pertain to either the level of bony fracture or neurological damage. This led to a definition of severity based on the neurological site of the injury, such as C5 or T10. In the same year, Frankel introduced a system of grading (A through E) the completeness of injury to illustrate the benefit of postural reduction.11 Although some measures approach a century in origin, it is important for the next generation of clinical investigators to appreciate how the past has shaped our current concepts so that they may project their future role in SCI care and cure. This appreciation may be reflected by the following recognition of the founder of our society.

Sir Ludwig Guttmann, whom we honor today, is connected to several sources of the ISCOS International Neurological Standards, which we have just described. The first source relates to the sensory key points used to identify the sensory levels of an SCI in the extremities and both the sensory and motor levels of the trunk. Foerster8, the mentor to Sir Ludwig in Breslau and the foremost neurosurgeon in Europe12 during the early part of the 20th century, published ‘the dermatomes of man’ in 1933, which represented the culmination of over 20 years of detailed observation and testing. During most of this period, Sir Ludwig worked with Foerster13 (Figure 1a), who is credited as the primary source of the dermatome chart that we use daily in our examinations from the classic textbook by Haymaker and Woodhall.14 We use this method to determine the extent of the damage to the spinal cord or the neurological level (Figures 2a and b). The second link is through Sir Ludwig's student and protégé, Hans Frankel, who created the Frankel Grades in a publication11 to honor Sir Ludwig on the occasion of his 70th birthday (Figure 1b). The Frankel Grades, which are the forerunner of the ASIA Impairment Scale, are the most frequently cited outcome measure in the literature (Figure 3), which determine the completeness of the injury. Frankel, who used the word ‘function’ for Grade D (3/5 muscle strength or greater), implied the concept of walking function, which linked it to the severity of the injury, a linkage that we will explore further under domains.

Figure 1
figure 1

(a) Dr O Foerster with chief assistant Ludwig Guttmann, 1929.13 The illustration is between pages 112 and 113, entitled ‘With Dr Otfrid Foerster in Breslau’. (b) Guttmann and Hans Frankel with Haile Selassie: late 1960s. Courtesy: Dr Hans Frankel.

Figure 2
figure 2

(a) Figure 43, page 20, Scheme of Thoracic Dermatomes.8 Reprinted with permission from Oxford University Press. (b) Key sensory points. American Spinal Injury Association: International Standards for Neurological Classification of Spinal Cord Injury, revised 2000, Atlanta, GA, USA. Reprinted 2008. Reprinted with permission from ASIA.

Figure 3
figure 3

Legend: Frankel, Frankel grade;11 AIS, ASIA impairment scale.4

It is fitting, therefore, to recall and recognize Sir Ludwig's relationship to today's topic through both his mentor, Foerster, and his protégé, Frankel, over a span of three generations. Sir Ludwig embraced Foerster's approach of research and treatment based on neurophysiology, whether it was applied to rehabilitation or surgery, and this legacy was transmitted by Sir Ludwig Guttmann to his professional progeny. Therefore, practical clinical research became an integral part of the care of persons with SCI, and our today's topic of outcome measures applied to clinical trials represents this tradition by the founder of our society.

Evolution of domains of function

The primary purpose of the ASIA Neurological Standards of 1982 was to reach an agreement on a clinical classification of SCI severity. The addition of a disability measure (1992–2002) to severity classification with standardization for use as outcome measures in research came later with adoption/endorsement by ISCOS.4, 15, 16 To appreciate the use of severity/disability outcomes in research, however, we need to understand the evolution of domains of function over this time period. Domains are a method of conceptualizing the interaction of the severity of injury in SCI (impairment) with the impact on the person as an individual (disability) and in society. The evolution of the concepts of disability over the past 30 years has been recently summarized17 (Table 1).

Table 1 Concepts and terminology used by models of disability17

The first effort to link impairment to the disability domain was conceptualized by Nagi,18 when he introduced the domain of functional limitation. This required, for example, a limited physical action such as inability to grasp a cylinder as the link to explain the inability to grasp a glass to drink (disability) owing to finger weakness (impairment). Functional limitation, however, was not incorporated into the World Health Organization (WHO) classification of impairment, disability and handicap.2 The WHO classification of 1980 indicated that impairment, such as finger weakness, resulted in disabilities such as limited feeding. Therefore, the WHO classification eliminated the link of a functional limitation domain between impairment and disability. Functional limitation was restored in a series of modifications of domains of disability by the Institute of Medicine19, 20 (1991, 1997) and by the National Institute of Health21 (1992) with the introduction of the societal and environmental impact on impairment/disability as a continuum. One proponent22 emphasized that an action such as reaching by the arm and/or grasping an object with the hand was needed to link muscle strength measures of the upper extremity to self-care measures such as feeding and grooming. This is because improvement in feeding (disability) can be achieved by the use of an adaptive device alone, with no change in finger strength, and would be misleading in a trial to improve strength and function.

In 2001, the WHO proposed an entirely new classification, the International Classification of Function, which eliminated most of the negative connotations of the 1980 classification. Disease/pathology became a health condition, impairment changed to body structure/function, and self-care/mobility was collapsed into activities, but functional limitation was again eliminated (Figures 4a and b). Although this offered many needed changes to the 1980 classification, such as elimination of the word handicap, it needs clarification particularly in the activity domain.17, 22 In this domain, it is difficult to distinguish between functional limitations, capacity, activities, self-care/mobility and performance, as there is considerable overlap. It appears that these different activities are often all lumped under ‘functioning’ in the new classification termed as International Classification of Functioning, Disability and Health.3

Figure 4
figure 4

(a) World Health Organization (WHO) International Classification of Functioning, Disability and Health model of functioning.22 Figure 1, p 114. Reprinted with permission from J Rehabil Res Dev. (b) Modified International Classification of Functioning, Disability and Health model of functioning. Model extracts capability/functional limitation from activity limitation and explicitly divides activity into capacity and performance subdomains.22 Figure 2, p 115. Reprinted with permission from J Rehabil Res Dev.

This confusion over the International Classification of Functioning, Disability and Health terminology of domains is further illustrated in tests of hand function, which have been introduced for use as outcome measures in recent publications. Each author has attempted to apply the International Classification of Functioning, Disability and Health model with different terms, such as ‘capability,’22 ‘basic activity’23 and ‘capacity’,24 for linkage between body function and performance. Two of the authors (Marino and Post) make reference to their measures as related to the former domain of functional limitation. This strongly suggests the need for an effort by ASIA/ISCOS and/or the Spinal Cord Outcomes Partnership Endeavor (SCOPE) to clarify the terminology for use in all SCI clinical trials.

Evolution of outcome measures in randomized multicenter clinical trials: lessons learned

Although the four major randomized multicenter clinical trials (RMCT) in SCI did not show significant functional improvement, it has been stated ‘even failure of a treatment to restore significant function would be informative if the discrepancies between laboratory and clinical results could be reconciled…’.25 Information on use of outcome measures or lessons learned may also be gained from ‘failed trials’ (Table 2). In Figure 5 ‘Time Line’, the evolution of trials over the past 25–30 years is listed by both the time the study was initiated and the time it was reported, as the initiation date reflects when the outcome measure was chosen and the publication date reports on the results. This listing is not inclusive of all trials and Tator's26 review is more comprehensive. These four trials were chosen because of the author's familiarity and their illustrative value.

Table 2 Lessons learned
Figure 5
figure 5

Timelines of Multicenter Randomized Clinical Trials (MRCT) Acute Spinal Cord Injury (SCI) Started and Reported (1985–2006). Word key: NASCIS 2, Second National Acute Spinal Cord Injury Study; GM-1, GM-1 ganglioside sygen; NASCIS 3, Third Acute Spinal Cord Injury Study; BWSTT, body weight-supported treadmill training. Multicenter Randomized Clinical Trial (MRCT) are in black Pilot or Planned Trials are in grey.

The modern era of RMCT in SCI opened with a clarion call of the first major advance in treatment, published as the lead article in NEJM, May 1990. This study on the effect of high-dose methylprednisolone on improving outcomes in acute SCI was the first large RMCT conducted by the National Acute SCI Study (NASCIS) group funded by the National Institutes of Health. It claimed27 an improvement of almost 10 motor points on one side with a significant improvement in sensation. This RMCT of 333 subjects was followed up 1 year later28 by another lead article in the New Eng J Med, which reported on a randomized placebo-controlled pilot study of only 34 subjects treated with GM-1 ganglioside. The improvement in motor score was 15 points for both sides in this study and ‘a significant … improvement of Frankel grades from base line to the one-year follow-up.’28

There was criticism of both studies: the methylprednisolone study, because an improvement of 10 motor points was not equated with improved function,29 and the ganglioside-GM-1 pilot study of only 34 subjects, which did not randomize into equal groups.29 The improvement in AIS/Frankel Grades in the ganglioside trial, however, encouraged the designers of the subsequent trial of 760 subjects to choose a modification of the AIS/Frankel Grades as its primary end point.30 Irrespective of the past and current criticism,31 the NASCIS 2 trial was a well-designed RMCT and set the standard for trials of future interventions for SCI. The criticism of the failure to equate improvement of strength with function led to the addition of a functional measure of improvement with a disability scale, the functional independence measure (FIM), which had been validated in the rehabilitation literature in 1987.32 This was the only validated measure of disability at the time and was used as a secondary end point for the NASCIS 3 trial begun in December 1992.33 The report of the NASCIS 2 study also helped to stimulate the development of a consensus between members of the NASCIS group, ASIA and ISCOS, which produced the International Standards for the Neurological and Functional Classification of SCI in 1992 (ISNFCSCI).34 This included the modified Frankel Grades (AIS), motor and sensory scores, neurological levels and FIM.

The International Standards gained immediate acceptance35 and the neurological items were incorporated into the beginning of the second very large MRCT ganglioside trial initiated in 1992. This trial chose a more robust end point, the Benzel Scale36 as the primary outcome measure, which was a combination of the Frankel Grades with an expanded Grade D with three levels of walking added. Although this new measure (Benzel Scale) was not validated by current psychometric methods, it attempted to integrate an impairment scale with a functional measure of walking, which had been suggested by others37 and implied by Frankel in his definition of Grade D as functional strength. This need for a measure, which would link the weakness of the lower extremities to locomotion function, led to the development of the Walking Index for Spinal Cord Injury in 200038 and the application of walking speed as a primary end point in the locomotor training trial initiated a year earlier.39 The ganglioside trial, the largest to date (760 subjects), also provided an invaluable database, which has provided the opportunity for critical examination of recovery patterns based on AIS, motor and sensory scores.40, 41

This effort to link improvement of neurological function, such as strength in the legs, to mobility continues to be a challenge in clinical trials for several reasons. As phase 1 clinical trials must consider the primary end point as safety, the severity of SCI is typically a complete lesion, in which lower extremity recovery is limited. This requires a robust gain of neurological recovery to produce major improvements in mobility. In fact, the recent report by the International Campaign for Cures of Spinal Cord Injury Paralysis (ICCP) panel on outcomes42 advised caution regarding the use of the ASIA Impairment Scale as a primary end point in phase one studies, because it might demand too robust an improvement. In addition, there did not seem to be a current fully validated performance scale (formerly disability measure) for practical use as a primary end point in subjects with complete SCI,42 although several well validated functional capacity scales for walking, such as the 10-meter walking speed test and the walking index for spinal cord injury (WISCI), had emerged for use in studies of incomplete SCI. The spinal cord independence measure, developed as a disability scale in 1997,43 has been recently validated44 and claims to be superior to FIM for several reasons. It was developed specifically for SCI lesions and was shown to be more precise. However, there are some limitations to linking improvement in body structure to a global performance scale, as improvement in spinal cord independence measure scores is possible with no improvement in AIS grades or walking capacity45 (Figure 6). Improvement in mobility can be seen as a result of training alone, as a person with a complete transection of the thoracic spinal cord, who remains completely paralyzed in the legs, but with normal arms, is capable of improvement from being dependent in bed to complete independence in mobility, by wheelchair locomotion.

Figure 6
figure 6

Recovery from a spinal cord injury.45 Figure 1, p 681. Reprinted with permission from J Neurotrauma. Word key: ASIA, American Spinal Injury Association; SSEP, somatosensory evoked potentials; SCIM, spinal cord injury measure; MEPs, motor evoked potentials; WISCI, walking index for spinal cord injury.

Although the Federal Drug Administration in the United States has no official position, at a recent meeting in Washington, DC, in which they participated with clinical investigators, as well as with members of the drug industry and other federal agencies, use of functional capacity measures linked to clinical meaningful improvement seemed to be a desired primary end point for use in a clinical trial rather than an improvement in body structure alone. (J Steeves: personal communication).

Emerging capacity/capability scales linked to domains of body structure and performance

The recent study46 of locomotor training in persons with incomplete SCI was able to show improvement in walking function with several measures, which included functional capacity and performance (disability) measures that correlated with increase in strength of the legs. Although this study failed to show a significant difference between the treatment and control arms, it did show improvement of walking as a result of more intense training compared with historical controls. Persons with incomplete SCI with AIS grade C or D at enrollment within 56 days of injury showed an improvement in walking with less physical assistance as determined by the locomotor FIM and/or walking speed, the two primary end points. Secondary end points such as the Walking Index for SCI and the 6 min walk, a measure of endurance, all correlated with an increase in the strength of the legs.47 Similarly, walking speed based on 10-meter walking test has shown a high correlation with other timed tests and a good correlation of WISCI48 in motor recovery after SCI, which led to a recommendation by several groups that these were the best validated scales for measures of walking capacity in SCI clinical trials.42, 49 The improvement in lower extremity motor scores at baseline explained most of the variance in improvement in WISCI, thus linking body structure with walking capacity, and together these measures correlated with the subtest of FIM for walking function. A recent European Multicenter Study about Spinal Cord Injury (EMSCI) database publication has revealed the linkage of improvement in AIS grades (body structure) with 10-meter walking test and WISCI (capacity measures) in an effort to validate the performance domain of indoor walking assessed by the spinal cord independence measure50 (Figure 7a).

Figure 7
figure 7

(a) Word key: AIS, ASIA impairment scale; LEMS, lower extremity motor scores; 10 MWT, 10-meter walk test; SCIM, spinal cord independence measure; WISCI, walking index for spinal cord injury. (b) Word key: CUE, capabilities of upper extremity; GRASSP, graded redefined assessment of strength, sensibility, and prehension; SCIM, spinal cord independence measure; FIM, functional independence measure.

Although improvement in upper extremity function in an acute rehabilitation setting was emphasized by the introduction of the Quadriplegic Index of Function,51 this was a measure of self-care and included the use of special devices and physical assistance. There is a 40-year experience in the measurement of hand function52 by rehabilitation professionals, surgeons and engineers53 concerned with tendon transfers and functional electrical stimulation restoration of arm/hand function in individuals with cervical SCI. Few studies, however, combined functional assessment of grasp and reach until recently.54 The capabilities of the upper extremity has been reported in the past 10 years to be valid as a questionnaire,54 and together with the Van Lieshout test23 and components of another hand test (graded redefined assessment of strength, sensibility, and prehension),24 illustrate efforts to link improvement in strength of the arms with improvement in self-care by a standardized measure of capability/basic activity/capacity (formerly functional limitation). These recent efforts recognize that an improvement in performance (self-care) alone may be solely due to training with adaptive devices without a significant increase in motor score (body structure). Therefore, capability or capacity measures need to show linkage between body structure and self-care (Figure 7b).

There is a subtle difference between a capability and capacity measure in that assistance and specific self-care activities are never permitted in a capability measure. However, whether the link between body function and performance is shown either a capability or capacity measure, standardized rigorous assessments of function are essential and must be independent of environmental/societal adaptations. Examples of these linkages have been illustrated in several recent studies of walking function47, 50 and upper extremity function.24

The future

Each measure, whether body function, capacity or performance, has limitations. Greater precision in the measure of body function has been elusive. Although neurophysiological measures of body function/structure have not shown promise in some acute studies,45 they may have value in persons with chronic SCI.55 Quantitative sensory56, 57 and motor testing procedures also offer promise of greater precision. Recent advances in neuroimaging of SCI hold promise for development of anatomical/physiological surrogates of the severity of injury for use in clinical trials. The dramatic visual interruption of white matter tracts and glial scar shown by spinal cord diffusion tensor imaging58 may enable us to monitor recovery patterns in the future. diffusion tensor imaging has also shown quantitative changes based on the severity of SCI,59 which have been correlated with the ASIA Impairment Grades.60

Although WISCI and 10-meter walking test are recommended42, 49 as the best-validated measures of walking capacity today, this does not mean that they will be so tomorrow, as we learned from FIM in 1997 (Table 2). As we apply these measures in future trials, we need to validate their responsiveness and evidence of improvement that has clinical significance. The measure of minimal clinically important difference has gained interest and was originally defined as ‘the smallest difference in score in the domain of interest which patient's perceive as beneficial.’61 Both the consumer's and clinician's perspective on what constitutes a meaningful change are valuable; however, their views do not always agree.62 Ultimately, increased participation in society63 and improvement in quality of life64 should be shown. However, factors other than repair of neurological injury may contribute to these gains and will not be covered in this review.

Finally, the linkage of body function, capacity and performance should be known before entering a phase 3 trial, especially if all measures are to be included in the primary outcome measure. It is possible to consider the use of multiple primary end points or a global statistical test.65 These issues need to be examined on a regular basis and we must encourage and applaud the combined efforts by ASIA/ISCOS and SCOPE to update outcome measures for use in clinical trials.

In summary, Foerster's dermatome map, Frankel grades and the manual muscle test helped to classify the severity of SCI with the development of International Standards. This classification of impairment, later modified for research purposes, was linked to self-care/mobility through rigorous standardized measures of capacity/capability. Such linkages are essential as outcome measures for clinical trials of neurological restoration, but require clarification by SCI investigators. Their role has been refined by use in clinical trials with many lessons learned as to their strengths and limitations. These lessons should provide the framework for advancing the development of measures to better quantify the pathophysiology of severity and the minimal clinical significance of restored capacity and capability of function.