Editorial Note on: Spinal Cord advance online publication, 30 October 2012; doi:10.1038/sc.2012.127

I would like to commend the authors for their enormous efforts in systematically evaluating the psychometric properties of the International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI) in young patients (for example, Krisa et al.1 and Mulcahey et al.2). SCI is rare in young individuals and the authors have shown great endurance in collecting data, which is highly appreciated.

In their latest contribution, the authors determined intra-and inter-rater agreement of segmental sensory and motor scores.1 Intra-class correlation coefficients (ICCs) were calculated over rank-transformed data for each segment. Although the analyses seem correct, I suggest that we could have learned more about agreement, if the data were analyzed differently. In this note, I will discuss this only for the sensory data of the patients with complete lesions.

First, the analysis did not account for the level of lesion. This is unfortunate, as the impact of poor segmental reliability on ISNCSCI outcomes is larger when evaluating segments around the level of lesion. It might even affect correctly classifying the American Spinal Injury Association Impairment Scale, especially in segments lacking myotomes.3 If there are not enough patients with identical levels of lesion to analyse, segments could be grouped according to their relative distance to the (for example, sensory) level of lesion.4

Second, ICCs reflect to a large extent variability between subjects rather than agreement. Including patients with all levels of lesion will result in lower ICC values for upper cranial or lower caudal segments and higher ICC values for segments in-between. This was also found by the authors but does not necessarily indicate better agreement for these segments. ICCs appeared incalculable for lower lumbar and sacral segments. Very likely, this means that all subjects reported no sensation (no between-subject variability) and all raters assessed this with perfect agreement.

Finally, for clinical practice, more intuitive measures, such as the percentage of agreement or a Kappa statistic might have been more informative, as these reflect the actual level of agreement and are more appropriate to quantify agreement of a measure with only three ordinal categories (absent, impaired, normal). Indeed, the authors previously performed such analyses quite successfully.5