With great interest we read the work “Achieving assessor accuracy on the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI)” by Armstrong et al. [1] presenting the results of a retrospective analysis of 208 ISNCSCI worksheets obtained in three multicentre “Spinal Cord Injury and Physical Activity (SCIPA)” trials. The authors report that only one quarter of the 184 ISNCSCI worksheets, which were reviewed by an expert panel using a validated ISNCSCI calculator [2], were error-free. The remaining contained one or more errors (total: 242), mainly in determination of (descending order of error rate) motor levels (ML), motor/sensory zones of partial preservation (ZPPs), sensory levels (SL), motor/sensory scores and ASIA Impairment Scale (AIS).

Given the high number of inaccuracies the authors conclude that continued training and computerized algorithms are essential to ensure accurate ISNCSCI scoring, scaling and classification and achieve confidence in clinical trials. However, we argue that important information is missing to justify this statement and its generalization. In a more general sense, we want to recommend a basic set of reporting items for better comparability of studies investigating ISNCSCI assessor accuracy.

The paper from Armstrong et al. does not contain detailed information about the distribution of the investigated patient group in respect to neurological level of injury (NLI) and AIS. Without this information, a direct comparison with error rates reported in other studies, e.g., a previous retrospective analysis and computerized re-classification of 420 manually classified ISNCSCI data sets conducted by the European Multicenter Study about Spinal Cord Injury (EMSCI) [3], is not possible. Since it is known that classification of incomplete lesions is more difficult [4], every study analyzing ISNCSCI classification performance needs to provide information, as to how well the analyzed group follows a representative AIS distribution. In this regard, the inclusion criteria of the SCIPA trials might have introduced some bias (“Full-on” trial (n = 116): NLI C6-T12, AIS A–D; “Switch-On” trial (n = 22): NLI < T12, AIS A–C; “Hands-on” trial (n = 70): NLI C2-T1, AIS A–D).

It is also important not only to report the frequency, but also on the magnitude of errors. Unfortunately, the authors provide only one detail on this issue stating that 47.6% of errors in ML or SL were ≥2 levels. Interestingly, this percentage is twice as large as the 23% reported in EMSCI [4], despite the fact that in none of the SCIPA trials patients with lumbar and sacral NLIs were included, within whom ML errors are found to the largest extent [4].

We fully support the authors’ conclusion that continued training is necessary to ensure accurate ISNCSCI scoring, scaling and classification [5, 6]; however, no information about repetition of training in the SCIPA trials is provided. Additionally, we strongly believe that it is necessary to precisely define the content of the training and to assess its outcome and efficacy in a quantitative manner, e.g., by pre-/post-training classifications of difficult cases. This is not only important for objectively assessing the overall examiners’, auditor’s and trainers’ skills and to define minimum requirements of study assessors, but also to allow for adjustment of the training course contents. The latter is important to eventually address non-ISNCSCI-specific errors like the 52 score summation errors, of which only 3 were corrected by the experienced audit panel. With appropriate formal ISNCSCI trainings, a high level of correctness (approx. 90%) can be achieved [5, 6].

The ISNCSCI published by the American Spinal Injury Association (ASIA) and the International Spinal Cord Society (ISCoS), together with its publicly available e-learning modules (http://asialearningcenter.com), represents one of the most sophisticated instruments among all assessments for patients with CNS injuries/diseases. It has a long history of revisions [7] and will be continuously updated to achieve the highest level of consistency, comprehensiveness, and face validity [7, 8]. In particular, the 2011 ISNCSCI revision focused on clarification of issues that turned out to be inconclusive in the 2003 revision [9]. Therefore, classification results of assessors trained with different ISNCSCI versions should not be pooled without checking for group differences. It would be highly interesting to see if rates or types of errors changed in SCIPA assessors trained with the 2011 vs. the 2003 version.

Finally, we completely agree that validated ISNCSCI calculators represent modern instruments to improve data quality, support training, and to identify ISNCSCI issues that need clarification [2, 3]. However, even validated calculators might misclassify data sets, specifically in the presence of non-SCI related issues (e.g., peripheral nerve injury). We therefore strongly believe that with proper training a high level of examination and manual classification accuracy can be achieved and that only both skills together form the basis of a fully qualified ISNCSCI assessor.

In general, in all studies reporting on the use, training and accuracy of ISNCSCI assessments, it is important to (1) characterize the patient population evaluated relative to a representative AIS and NLI distribution, (2) report both the frequency AND the magnitude of the assessment errors, and (3) clarify the specifics and assess the outcome of the trainings (e.g. ISNCSCI version, duration and content of training, any follow-up training, amount of regular use by the trainee, pre-/post-training tests).