Introduction

The International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI)1 are widely accepted among clinicians and researchers for classification of the location, severity and extent of a human spinal cord injury (SCI). ISNCSCI has been well investigated2 throughout its >40-year history. The severity classification was fundamentally revised during this time period. In detail, the American Spinal Injury Association (ASIA) Impairment Scale (AIS), which superseded3 the original Frankel Scale,4 introduced several changes aiming at a more consistent and objective description, on the basis of the scores of a standardized examination of myotomes and dermatomes.5

The six-point manual muscle test6 adopted for ISNCSCI can only be performed in the myotomes of arms and legs. Thus, for all other segments (cervical C2–C4, thoracic T2-L1 and sacral S2–S5) the graded assessment of key muscles is difficult and has not been formally included in the protocol. Consequently, the precise motor classification of individuals with thoracic or very high cervical lesions relies on inference from the sensory scores. We recently identified the determination of motor level and motor incompleteness in these areas as most difficult classification tasks.7

The anatomical constraints together with the complex AIS definitions lead to a large set of classification rules. A rater must either memorize all ISNCSCI classification rules correctly or look them up in the available ISNCSCI booklet8 or refer to a condensed version on the backside of the assessment form. The necessity and value of training on the ISNCSCI assessment has been addressed in several previous publications.7, 9, 10, 11

The ISNCSCI evaluation consists of two parts, which require different types of skills.3, 12 The physical assessment of an individual consists of segmental sensory (light touch, pinprick) and motor (manual muscle test) evaluations, including an anorectal examination. The segmental level and magnitude of these sensorimotor scores forms the basis for the subsequent classification part. Accurate classification is then based on specific ISNCSCI rules that use the collected scores to determine all functional levels (motor/sensory and neurological level of injury (NLI)), as well as the AIS.

Although the reliability of the ISNCSCI assessment has been investigated,13, 14, 15 an analysis for the reliability of the ISNCSCI classification, by typical clinicians on a typical cohort of SCI patients, has not been done. Therefore, the aim of this investigation was to describe and quantify any discrepancies in the ISNCSCI classification by typical SCI clinicians versus validated computational ISNCSCI algorithms.16 This study was conducted as a part of the quality management system (ISO 9001:2008) of the European Multicenter Study on Human Spinal Cord Injury (EMSCI - http://emsci.org).17

Materials and methods

Data sets from the first years of the EMSCI network (May 2003—April 2005; 2 years after project launch in 2001) were used to compare manual versus computerized ISNCSCI classification.16 ISNCSCI data sets in EMSCI contain classification variables, as well as all segmental scores of light touch, pinprick, muscle function and anorectal examination. During the indicated time period, all ISNCSCI examinations and subsequent classifications were manually performed by SCI clinicians. The classification was carried out either directly after the examination or at the latest before entering the data set into the database. In the worst case there was only a timespan of a few days between examination and classification.

EMSCI data evaluation was computerized in terms of ISNCSCI classification in April 2005.18 As part of this computerization, all ISNCSCI data sets, including the previously manually classified datasets were recalculated by the computer program. A backup of the manually classified data sets was used for this study. This reclassification process is repeated on ISNCSCI revision changes or bug fixes in the classification software.

The training levels of the clinicians performing the ISNCSCI assessments and classification within the time frame of this study was retrospectively surveyed by phone or by electronic mail and summarized in Table 1.

Table 1 Participating SCI centers and the number of patients included in this work

A flowchart of data processing together with the procedure for data set selection is provided in Figure 1. All data sets from the Spinal Cord Injury Center at the Heidelberg University Hospital were excluded, as the computerized ISNCSCI evaluation was developed in-house and used at the site before the official start of computerized classification throughout the EMSCI network. In addition, all ISNCSCI data sets containing not testable (NT) segmental scores in either light touch, pinprick, motor or anorectal examination were excluded (Box ‘Exclusion of fragmentary data set (examination)’ in Figure 1) owing to missing instructions on how to deal with this issue in the 2003 reference manual.

Figure 1
figure 1

Data processing flowchart accounting for excluded data sets at every processing stage. Gray shaded boxes describe processing steps that are necessary owing to the ISNCSCI clarifications published by Waring et al.19 2009. The fork is necessary because the excluded data sets for the ZPP analysis are exclusively relevant for this analysis. Excluding them globally would decrease the overall sample size for general ISNCSCI analysis (right fork). AIS, ASIA Impairment Scale; ASIA, American Spinal Injury Association; EMSCI, European Multicenter Study on Human Spinal Cord Injury; ISNCSCI, The International Standards for Neurological Classification of Spinal Cord Injury; NLI, neurological level of injury; ZPP, zone of partial preservation.

EMSCI’s strict inclusion and exclusion criteria allow only single-event traumatic or ischemic SCIs. Concomitant peripheral nerve lesions above the level of injury (that is, plexus brachialis impairment), preinjury polyneuropathy, multiple SCIs and severe traumatic brain injuries are the exclusion criteria. Overall, these criteria reduce the probability for the presence of concomitant non-SCI-related neurological impairments above the level of injury. This is particularly relevant, because information in the comment box of the ISNCSCI worksheet, where non-SCI related issues are usually documented, is not used in the computational classification process.

Examinations and manual scoring were based on the ISNCSCI 2003 reference manual,12 which has now been superseded.1, 3 Even though the basic classification rules have remained unchanged, some clarifications were published by Waring et al.19 and the revised ISNCSCI booklet8 (2011) and subsequently integrated into the computerized classification.

These clarifications led to the implementation of two additional rules into the classification algorithms: (1) Motor levels, sensory levels and the single neurological level were set to C1, if segment C2 was already impaired,8 and (2) if there is no spared function below the sensory or motor level in a person with a complete injury the sensory or motor level should be listed in the zones of partial preservation (ZPP) block.19

All data sets affected by these rules were excluded from the analysis to avoid any bias of the results in favor of the computational algorithms. In detail, all cases in which clarification (1) applied were excluded for all analyses (gray shaded box ‘Exclusion of NLI C1’ in Figure 1) and cases in which rule (2) applied were excluded for the ZPP analyses only (gray shaded box ‘Exclusion of data sets where respective levels match ZPP’ in Figure 1). Besides these clarifications affecting the technical ISNCSCI implementation, the fundamental classification rules were not revised. However, some rules were rewritten to make the wording more precise.1 The motor level determination was clarified (for areas where there are no myotomes to test), as well as the correct use of the according reference levels throughout the AIS classification process. In the recent standard, it is explicitly stated that to distinguish between sensory incomplete (AIS B) and motor incomplete lesions (AIS C/D) the motor level on each side is used as reference level, whereas the NLI is used to discriminate between AIS C and AIS D.

Besides these clarifications directly or indirectly relevant to this work, the recent ISNCSCI booklet (2011) introduced several additional changes, among them are altered positions for motor testing, the use of non-key muscles to distinguish in borderline cases between AIS B and AIS C, the renaming of the anorectal sensory test to deep anal pressure and a clarification that the ZPPs are not referenced from the NLI but instead from the corresponding sensory or motor level.1

All included data sets were computationally reclassified using the latest validated EMSCI ISNCSCI implementation of ISNCSCI’s current revision published in 2011.1 The latest version of the calculator was used in favor of the version implemented during the study period (2003–2005), because the comprehensive validation of the calculator’s algorithms was performed later (2009–2010). The validation was performed by the first and the last author of this study. In an iterative approach, the algorithms were tuned on the basis of over 5000 not testable-free data sets of the EMSCI database until human ISNCSCI experts and the computer program agreed on the classification results. Public sources of correct data sets such as the reference manual (Appendix B),12 the booklet8 and a collection of difficult cases20 were additionally and successfully used for validation. To our best knowledge, the current EMSCI algorithms implement ISNCSCI correctly so that differences between the computer and the clinician can be interpreted as the clinician’s error. The details of the algorithms and the validation process are published elsewhere.16

Both manual (clinician) and computational (computer) classification methods were statistically tested for differences in the following ISNCSCI variables: right/left sensory level, right/left motor level, AIS, sensory and motor zones of partial preservation for right and left side. At the time of the assessments (2003–2005), the single NLI was not listed in the ISNCSCI assessment sheet (REV2001)12 and was therefore not included in our analysis.

The degree of concordance is presented as raw percentages and histograms and confirmed statistically using Wilcoxon’s matched pair test (all levels and ZPPs) and Bowker’s test (AIS). All analyses were performed with Statistica 9.1 (StatSoft. Inc., Tulsa, OK, USA) and IBM SPSS Statistics 21 (IBM Corporation, Chicago, IL, USA). Significance level was set to α=0.01. This more rigorous significance level was chosen in favor of the often used level of 0.05 owing to the retrospective character of this study and the comparatively large sample sizes. In general, a lower significance level reduces the risk of false positive tests.

Results

The database sample for this study contained 420 eligible ISNCSCI data sets of 185 patients treated in six SCI centers (Table 1). Table 1 also lists the involved clinicians performing ISNCSCI examinations and classifications in these centers at that time, together with an overview of their training levels. The differences between manual and computational ISNCSCI scaling, scoring and classification are summarized in Table 2. The lowest agreement was found for motor levels (right: 62.1%, left: 61.8%) followed by motor ZPP (right: 81.6%, left 80.0%) and then AIS (83.4%). The differences in the motor levels (right: P=0.002; left: P=0.003) and the AIS (P=0.001) were significant. Sensory levels showed the best concordance (right: 90.8%; left: 90.0%), as did sensory ZPP (right: 91.0%; left: 92.2%). Histograms for all level variables are presented in Figures 2a–d. Positive differences on each x axis indicate that the clinician determined a more rostral level than the computer implementation. While sensory levels (Figure 2a), sensory ZPPs (Figure 2c) and motor ZPPs (Figure 2d) show a symmetrical distribution of errors around the correct level, motor levels (Figure 2b) are skewed toward positive differences, which means a deviation by clinicians to classify a more rostral level.

Table 2 Agreement between computational and manual scoring, scaling and classification: manual results got subtracted from computational results, and thus, for example, a difference of one segment means that the manually assigned level was one segment rostral or caudal to the level determined by the computational algorithm
Figure 2
figure 2

Differences between clinicians and computational ISNCSCI scoring, scaling and classification for all derived ISNCSCI variables (Sensory levels (a), Motor levels (b), Sensory ZPPs (c), Motor ZPPs (d) and AIS (e). Raw agreement in percent is displayed on the y axes of all subfigures. Positive differences on each x axis indicate that the SCI professional determined a more rostral level than the computer implementation. Subfigures a–d have the differences in levels on their x axes, subfigure e has the difference in AIS grades. ASIA, American Spinal Injury Association; ISNCSCI, The International Standards for Neurological Classification of Spinal Cord Injury; ZPPs, zones of partial preservation.

The motor level agreement was analyzed in more detail (Figure 3) by taking into account that only myotomes within the arms (Figure 3b) and legs (Figure 3d) are included in ISNCSCI’s manual muscle examination. Different error patterns are revealed for all other myotomes (Figures 3a and c). The skewness in the overall motor level agreement (Figure 2b) is unambiguously caused by the misclassifications of motor levels within the spinal segments of arms C5-T1 and legs T2-L1 (Figures 3b and d, respectively).

Figure 3
figure 3

Differences in motor level determination described as raw agreement. Positive differences on each x axis indicate that the SCI professional determined a more rostral level than the computer implementation. In subfigure (a), motor levels are within the high cervical region (C2–C4), and in subfigure (c) motor levels are within the thoracic region (T2-L1). In those segments by definition the motor level follows the sensory level, because the myotomes cannot be assessed. Subfigures (b) and (d) depict all testable myotomes on arms (b) and legs (d).

The correct classification of AIS grades C (concordance 54.5%) or B (concordance 66.7%) appeared to be more difficult in comparison with AIS grades A (concordance 93.4%) or D (concordance 86.0%; Figure 4). AIS B is most commonly misinterpreted as AIS C and vice versa (AIS B as AIS C: 29.4%; Figure 4b and AIS C as AIS B: 38.6%; Figure 4c).

Figure 4
figure 4

Degree of concordance in AIS classification of computation algorithms versus clinicians. The correct classification is arranged by AIS (ad). White sectors display concordance between the computer and the clinician. Accordingly, gray sectors display discordance. The gray shading encodes the percentage of discordance. AIS, ASIA Impairment Scale; ASIA, American Spinal Injury Association.

Discussion

In this retrospective analysis of neurological data sets, obtained in the early years of EMSCI (2003–2005), the differences in classification of ISNCSCI variables by clinicians and computational algorithms were compared in a large European cohort of traumatic and ischemic SCI subjects. This kind of analysis was previously performed exclusively in the artificial setting of ISNCSCI instructional courses.7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 In the framework of instructional courses, the effects of a formal ISNCSCI training are typically assessed by the same test before (pre-test) and after the training (post-test). In both tests, the attendees rate and classify a predefined set of ISNCSCI cases. The main outcome measure is the change of the percentage of correct answers in pre-testing compared with post-testing. Table 3 summarizes the results of our own instructional courses together with the results of the present study. Besides the previously published result7 that with training classification skills improve substantially, two conclusions can be drawn from the comparison of the instructional course results with this study’s results. First, the classification skills of the clinicians were much better than those of the course participants at the stage of the pre-test, and, second, the error pattern is slightly lower but in the same range as the post-tests results (Table 3).

Table 3 Comparison of clinicians’ classification errors (fourth column) versus ISNCSCI raters before an instructional course (second column) and trained ISNCSCI rates after the instructional course (third column)

With regard to the primary aim of this study focusing on the comparison of clinicians versus computational data evaluation, the manual determination of both the motor levels and the AIS were identified as the most challenging steps. This corresponds to previously published results on the effects of standardized formal training in the ISNCSCI classification part 7, 10 and a conference proceeding describing the differences between human and computational AIS classification in a North American cohort.22

Our results show that motor levels are frequently (26.4%; Figure 2b) classified one segment rostral to the correct level, indicating that the ISNCSCI definition of motor level was not applied properly. ISNCSCI defines motor level as the most caudal key muscle with a manual muscle test score of at least 3 out of 5 or better, provided that all rostral key muscles are judged to be intact and unimpaired (5/5). The motor level can be different for the right and left side of the body. Some of the clinicians’ motor level classification errors might be attributed to a misleading motor level definition of the front side of the old superseded ISNCSCI examination sheet (revision 2003). On this sheet, a brief statement right next to the boxes of the motor level defines all neurological levels as ‘the most caudal segment with normal function’, which is correct for the sensory levels but not for the motor levels (see the correct definition above). The current worksheet (REV 02/13) corrects this issue and provides an updated motor level definition on page 2. Determination of motor levels seems to pose a general problem for the raters. We found different error patterns (Figure 3) between myotomes on arms and legs compared to not in ISNCSCI examined myotomes in high cervical (C2–C4), thoracic (T2-L1) and low sacral (S2–S5) region. Please refer to Kramer et al.23 and Steeves et al.24 for further considerations and discussions regarding motor levels.

Another finding from this study is that AIS grades B and C are susceptible to being misleadingly exchanged (Figure 4), indicating that the determination of motor incompleteness is the most challenging step of the AIS classification. Therefore, determination of motor levels and motor incompleteness should be emphasized in ISNCSCI training and documentation. Instructive examples should be provided, which will help understand respective rules better and to reduce the likelihood of misconceptions. With the recent activities,5, 25 ASIA’s International Standards Committee already provided instructions for better interpretation together with some example cases regarding these issues. In fact, this issue was a highlighted topic in the 2011 revision1 and the accompanying references publication.5

On the basis of the results of this study, training on ISNCSCI classification is strongly recommended for clinical practice, as well as for research. Training programs are available online (International Standards Training e-Learning Program (InSTeP)3 developed by ASIA) and are usually parts of research networks7, 10 and clinical trials.26, 27 In addition, for research, one should consider accrediting trial investigators as ISNCSCI raters, only after they have successfully participated in workshops on the theory and practice of all aspects of the ISNCSCI assessment and classification. This is in line with the current guidelines of the ‘International Campaign for Cures of Spinal Cord Injury Paralysis (ICCP)’.28

Insights from this study may help to not only carefully interpret ISNCSCI data sets from previously conducted or currently running clinical trials, but also to plan future interventional trials in SCI. As outlined here, validated computer-aided algorithms will likely eliminate errors in ISNCSCI classification by humans.16

Limitations

This study uses data sets that were obtained almost 10 years ago and were predominantly classified by residents with different classification skills and level of training. A correlation between training levels and classification errors cannot be performed, because this information was not documented during the study period. These issues must be carefully taken into account when interpreting the outcomes of this study. Over the past decade, ASIA’s International Standards Committee has continuously worked on the clarification of the classification rules and on better education of the SCI community. It can be assumed that for large-scale clinical trials classification skills can be expected to be much better than in this cohort of typical, not explicitly (for SCI clinical trials) trained clinicians. However, for projects with smaller budgets such as SCI registries and pilot studies in SCI, the findings of this study are still important to (1) estimate the bias produced by manual classifications performed by nonoptimally trained clinicians and to (2) reinforce the important message that continuous training in ISNCSCI is needed. An effective way to minimize human classification errors and to reclassify ISNCSCI examinations according to new revisions of the standards is to use computational ISNCSCI classification.

However, there are limitations in relying exclusively on a computational approach. Ideally, an ISNCSCI examiner is performing a preliminary manual classification during the examination. This helps avoid misclassification of non-SCI-related issues above the level of injury and to identify the most crucial dermatomes and myotomes for classification. The latter include those segments in which motor function is preserved more than three levels below the motor level, which is important for the accurate classification of AIS B versus an AIS C/D. As a consequence, the examiner can focus on these dermatomes and myotomes to ensure conclusive scoring results. Relying exclusively on computer classification skills might extinguish this very important neurological diagnostic skill over time. One should keep in mind that correct examination scores represent the critical prerequisite for a correct ISNCSCI classification. Therefore, future studies will systematically explore the effects related to the examination technique in the EMSCI sample.

In early versions of the EMSCI database it was not mandatory to fill in all ISNCSCI variables. This fact alone resulted in a low share of fully evaluated ISNCSCI examination sheets, especially with regard to the sensory and motor levels (63.4%, Figure 1). In contrast, ASIA Impairment Scales were determined conscientiously (98.8%, Figure 1). This work used data sets that were acquired before institution of formal ISNCSCI instructional courses in the EMSCI network. Since 2006, more than 250 participants have been trained in an ongoing ISNCSCI training program.7 It is anticipated that the overall ISNCSCI skills increased in the EMSCI network, not only because of the ongoing training but also owing to the online training efforts of ASIA and the efforts of the International Standards Committee for better clarification of ISNCSCI variables, which will hopefully lead to more consistent standards. The information of the worksheet’s comment box is currently not incorporated into the computational algorithms, because data extraction from this free text information is a nontrivial task. A first step for including more clinical expertize into the algorithms might be to open the asterisk nomenclature currently used for the muscle grade 5 to all motor and sensory grade. This would identify situations in which the tested impairment may not be caused by the SCI. In computational classification, an asterisk denoted graded would be treated as intact without the need of information retrieval from the comment box.

Conclusion

Clinicians commit classification errors when assessing the neurological impairment after SCI. The most difficult tasks are the correct determination of motor levels and motor incompleteness. We recommend more clarification and clearer examples with regard to these issues in upcoming ISNCSCI revisions. Training is strongly recommended for clinical practitioners as an effective means for reducing classification errors. It should be a prerequisite for accreditation of examiners in clinical trials. For consistent classification, data sets should be pre-analyzed by ISNCSCI experts and finally processed by computational ISNCSCI algorithms as a means of effective quality control.

Data archiving

There were no data to deposit.