Introduction

Spasticity is a symptom of the upper motor-neuron syndrome and common among people with spinal cord injury (SCI). The prevalence of spasticity among people with SCI varies in different studies from 65 to 78% depending on the definition of spasticity, level and severity of the SCI, method of measurement and possibly the time post injury.1, 2, 3 About half of these use antispastic medication.2 In one study, roughly 40% reported the spasticity as problematic.3

An often used definition of spasticity is the one by Lance:4 ‘a motor disorder characterized by a velocity-dependent increase in the tonic stretch reflex with exaggerated tendon jerks, resulting from hyper excitability of the stretch reflex, as one component of the upper motor-neuron syndrome’. This definition has the benefit of being precise but has been challenged5 and does not reflect the multidimensional nature of spasticity. For the purpose of this study, we used a relatively broader definition developed by the SPASM consortium:6 ‘Disordered sensory–motor control resulting from an upper motor-neuron lesion presenting intermittent or sustained involuntary activity of muscles’. The definition distinguishes spasticity from the passive viscoelastic changes of muscle properties such as contractures, which are also associated with SCI.3 These changes in soft tissues and joints can be very difficult to distinguish clinically from the active part of spasticity.7 It is also important to keep in mind that the symptoms increased tone, clonus and spasms, and hyperreflexia can exist independently of each other and do not necessary share the exact same pathophysiology.8

A number of ways to measure spasticity exist: scales for manually testing, biomechanical and electrophysiological methods,7 as well as self-rating scales.

A measurement of spasticity that is simple and easy to administer is important both in daily clinical practice, for monitoring fluctuations and changes over time, to measure the effect of antispastic medication, as well as for research purposes.

Two of such clinical measures of spasticity are The Modified Ashworth Scale (MAS) and the self-reported Spasm Frequency Score (SFS). Each of the two scales could be viewed as indirect measures of two single constructs of spasticity. The MAS is a scale of perceived resistance (tone) against passive movement of the limb and is an adjustment of the original Ashworth Scale where the category 1+ has been added. The SFS is a scale from 1 to 4 on the number of self-reported spasms per day.9 The SFS is an adaption of Penn SFS (PSFS) where self-reported number of spasms the last hour is rated.10 Definitions are shown in Table 111 and Table 2.9

Table 1 The Modified Ashworth Scale (MAS)11
Table 2 Spasm Frequency Score9

Correlation between the MAS, other clinical measures, as well as biomechanical measures of spasticity has generally been low, and the validity of the MAS has been questioned.12 Despite the limitations of the MAS, it is still the most widely used scale in the clinical setting,3 and its use is recommended by the National Institute of Neurological Disorders and Stroke Common Data Element.13 The MAS has been tested in populations with spasticity after SCI, stroke, multiple sclerosis, traumatic brain injury, as well as cerebral palsy, showing varying results of the reliability; in general, the reliability was unsatisfactory.7, 14, 15 The mechanism of spasticity between SCI and other upper motor-neuron lesions may have a different etiology.16 A literature search found four studies17, 18, 19, 20 that assessed the reliability of the MAS of lower extremities among people with SCI. The choice of statistical method differed; some used simple (unweighted) kappa other used weighted kappa, while others used correlation coefficients.7 There were also differences in test protocols and differences in how confounders were controlled. This makes studies difficult to compare, and in light of this we find that there still is a need to reassess the MAS for lower extremity after SCI. The need to reassess MAS and self-reported spasticity scales are also supported by the review from Hsieh et al.14

Self-rating scales are a way of clinically assessing another construct of spasticity. To our knowledge, test–retest reliability has been studied in only one published study of the PSFS and none of the SFS scale. The study showed identical test–retest scores for PSFS in 40% of observations.21 Reliability of a self-reported severity of spasticity on a numeric rating scale from 0 to 10 in a population of multiple sclerosis found the scale to be reliable.22 Correlations between self-reported spasticity and examiner-based assessments have shown varying results from significant to poor correlation.21, 23

The aim of this study is to test the reliability of MAS and SFS in a population of individuals with SCI in a standardized setting and test the correlation between the two scales.

Materials and methods

Participants

Data were collected from February 2010 to February 2011. A total of 31 participants with SCI and the presence of spasticity in the lower extremities were enrolled in the study. All participants were inpatients at the Clinic for Spinal Cord Injuries, Rigshospitalet, Hornbaek, Denmark. Inclusion criteria were SCI, presence of spasticity and good general health condition. Exclusion criteria were new or ongoing urinary tract infection and severe limitation in range of motion (ROM) with a limited ROM of knee and hip together of >30°, same criteria as previously used by Craven and Moris 2010.20

Raters and procedure

All assessments were carried out by three physiotherapists experienced in treating people with SCI and used MAS in their daily practice. To ensure MAS assessment was performed in a standardized and an uniform manner,3 the three raters practiced the MAS assessment and interpretation prior to the study. During the data collection, the raters did not discuss the testing procedures, outcomes or other study-related issues.

To keep testing conditions as constant as possible, all assessments were carried out at the same time in the morning, before participants got out of bed, and participants only wore the clothes they had been sleeping in. This was done to keep activity level before testing at an absolute minimum. Participants who did not have an indwelling catheter went to the bathroom before testing.

The participants were moved directly from bed to bench in supine position to avoid any unnecessary activity. Testing was always performed for the right side first in the order of hip flexors, hip extensors, knee flexors, knee extensors, ankle plantar flexors and ankle dorsi flexors. Then, the rater moved around the bench and performed the same procedure for the left side.

On the day of MAS assessment, participants were screened for symptoms of urinary tract infection24 and asked whether they felt ok, and pain was documented if present. According to the protocol, the rater was instructed to postpone testing to another day if the participant did not feel ok. This was not necessary for any of the tests however. Room temperature was measured, and the participant was asked to rate the number of spasms the day before according to the SFS.

During each movement raters counted ‘one second’ to move the joint as smooth and constant as possible, to imitate the way MAS was performed in the original Bohannon study.11 It was decided not to use a metronome, as we wanted the test procedure to be similar to the daily clinical setting.

In case the movement triggered clonus, making MAS assessment impossible, it was reported as clonus and a missing value for the MAS analysis.

Information on medication was collected from medical records. No changes in regular medication were made between the two days, and medicine was administered at the same time of the day.

All tests were performed four times on each participant. Two ratings were performed at the first test day (Tuesday) and two on the second test day (Thursday) within the same week. Two different raters each performed one test on the first day for inter-rater evaluations. On the second day, one of the raters from the first test day would perform a retest for intra-rater evaluations and a third rater performed the last test also for inter-rater evaluations. The raters followed a rolling schedule to minimize possible inter-rater bias, by distributing the tests in a more random manner among the raters.

Statistical analysis

Statistical analysis was performed by SAS software package version 9.4 (SAS Institute Inc., Cary, NC, USA) and IBM SPSS Statistics version 22 (IBM Corp. Released 2013. Armonk, NY, USA). Kappa (κ) statistics was chosen as a reliability measure of MAS and SFS, as this is recommended as the most appropriate measure.7, 12 The original Ashworth Scale (AS) and MAS are designed with the level of measurement as ordinal scales, but it has been questioned whether the MAS can be treated as such, but rather should be considered a nominal scale.12 Whether the scale is considered ordinal or nominal is important for the choice of statistics. Both simple and weighted kappa values for MAS were calculated for comparison for reasons discussed further below. The SFS were considered an ordinal scale. When using weighted kappa, different weighting schemes can be chosen.25 The quadratic weights were chosen, as quadratic weights yield a kappa coefficient that is equivalent to the ICC under certain conditions.26, 27

Intra-rater reliability (κ) was performed by the same rater on different days. Inter-rater reliability was calculated between the two ratings carried out on the same day by different raters. Data from day one and two were merged for the inter-rater analysis. The same procedure was used for reliability testing of SFS.

Interpretation of strength of agreement of the κ-value is based on Landis and Koch:28 Poor κ<0; Slight κ=0–0.20; Fair κ=0.21–0.40; Moderate κ=0.41–0.60; Substantial κ=0.61–0.80; and Almost perfect κ=0.81–1.00.

Difference between first and second rating on the day of MAS is reported by median and interquartile range of the two corresponding ratings. The same procedure is used to test for systematic difference between raters.

For correlation analysis between MAS scores and SFS self-reported frequencies of spasms, Spearman’s rank correlation coefficient was used with Bonferroni correction for multiple comparisons.

Statement of ethics

We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during the course of this research. All participants received written as well as oral information about the study before giving their written consent. One participant was below 18 years of age, and both parents gave written consent. The study followed the guidelines of the Helsinki Declaration.

Results

Cohort data

Thirty-four participants with SCI were recruited for the study. Three participants were excluded because of urinary tract infection appearing during the study period, leaving 31 participants for analyses. Study population characteristics are described in Table 3 and reported according to recommendations.29

Table 3 Study population characteristics

Room temperature varied between 20 and 23 °C over the data collection period. Room temperature did not vary across test days for each participant.

The distribution of the location of MAS scores is shown in Figure 1. This figure shows a majority of the value zero, in particular for knee extensors, ankle dorsiflexors and hip flexors. Distribution of MAS and SFS scores is shown in Figures 2 and 3, respectively. The total number of ratings from all four assessments is reported.

Figure 1
figure 1

Distribution of MAS values based on location. The figure shows the distribution of MAS values (cf. Table 1) based on location of the muscle group. The number counts on the vertical axis show the total counts of all ratings performed. Right and left side is added together. A, hip flexors; B, hip extensors; C, knee flexors; D, knee extensors; E, ankle dorsiflexors; F, ankle plantar flexors. MAS, Modified Ashworth Scale.

Figure 2
figure 2

Distribution of MAS values. The figure shows the distribution of all MAS values (cf. Table 1). MAS, Modified Ashworth Scale.

Figure 3
figure 3

Distribution of SFS values. The figure shows the distribution of all SFS values (cf. Table 2). SFS, Spasm Frequency Score.

Difference between first and second rating showed a median of the difference (interquartile range) for the first day: 0 (−0.25;0) and the second day 0 (0;1). Similar analyses addressing the levels of the individual raters suggested that one rater had a tendency to score higher than the other rater on the same day and that one rater had a tendency to score lower than the other rater on the same day (results not shown).

MAS reliability

Intra-rater reliability ranged for the weighted kappa from ‘substantial’ to ‘almost perfect’, except for ankle dorsiflexors, which were ‘poor’ to ‘slight. Simple kappa ranged from ‘poor’ to ‘moderate’ for all muscle groups.

Inter-rater reliability ranged for the weighted kappa from ‘fair’ to ‘substantial’, except ankle dorsiflexors, which had ‘poor’ to ‘slight’ reliability. Simple kappa ranged from ‘poor’ to ‘fair’.

Table 4 shows the results of intra- and inter-rater reliability of the 12 assessed muscle groups.

Table 4 Modified Ashworth Scale intra- and inter-rater reliability

Clonus, resulting in a missing value, was reported for plantar flexors in 24% of the cases for intra-rater ratings and 29% for inter-rater ratings, calculated as an average of left and right side. It was evenly distributed on right and left side. A few instances of clonus were reported for dorsal flexors but none for the remaining muscle groups.

SFS reliability

Intra-rater reliability of SFS showed weighted κ=0.94 (95% CI 0.87–1) and simple κ=0.80 (95% CI 0.62–0.98). Inter-rater reliability showed weighted κ=0.93 (95% CI 0.89–0.98), and simple kappa κ=0.74 (95% CI 0.60–0.88). Interpretation of the weighted kappa for both groups was ‘almost perfect’ and with simple kappa for both groups ‘substantial’. The corresponding crude agreement to the kappa values was approximately 80% for both groups.

Correlation between MAS and SFS

Spearman’s rank correlation rho was calculated between observed spasticity (MAS) and self-reported spasms (SFS). The correlation coefficients ranged from a negative to a weak correlation (Table 5). Significance levels were not met (P<0.004) after Bonferroni correction for multiple comparisons.

Table 5 Correlations between MAS and SFS

Discussion

The objective of the study was to test reliability of the MAS and SFS in lower extremities in a population of SCI individuals and to test how the two scales correlate, as advised by previous reviews.7, 14

Reliability of the MAS is greatly affected by the choice of using weighted or simple kappa coefficients. Overall with the weighted kappa values, both intra- and inter-rater reliability was satisfactory, except for ankle dorsiflexors, where the poorest reliability also has been found in previous studies.17 With simple kappa, neither intra- nor inter-rater reliability was satisfactory. Intra-rater reliability is higher for test–retest by the same rater than inter-rater reliability irrespective to weighting or not. Results showed that some raters had a tendency to systematically score higher than others on the MAS. This could partly explain some of the disagreement between raters.

The standardization made in the study, in order to minimize confounding factors, could give rise to a higher reliability than may be seen in the clinical setting, where the same standardization would not always be possible. Examples of this are the activity prior to examination, positioning of patient during examination, angle of joints during measurements where the inertia could affect hip and knee differently, etc. It is thus important for repeated measures that the measures are taken in a very standardized way. Repeated measures of MAS should when possible be performed by the same rater, as reliability is higher and the measure could be performed in a more uniform manner. This is in line with the recommendation of previous studies.7

The prerequisite for using a weighted kappa is that the scale can be considered an ordinal level of measure. Following this a higher MAS score should reflect a higher degree of spasticity. The ordinal nature of MAS has been questioned for a number of reasons.12 First, it has been argued that adding the 1+ category introduced ambiguity in the scale, as the ordinality of the 1 and 1+ category is questionable and thus making it a nominal scale based on subjective criteria. Second, lack of biomechanical references to ‘catch’ and ‘release’ in the definition of MAS and third, the MAS description ‘minimal resistance to passive movement at the end of movement’ is hard to distinguish from changes due to passive viscoelastic changes often seen after neurological injury. These last two arguments are also indirectly supported by the poor correlation between MAS and biomechanical measures documented in previous studies.30, 31

Most of the reported MAS values were 0, 1 and 1+ (~80%, Figure 2). By regarding MAS a nominal scale, only exact agreement will be captured as agreement between raters, as simple kappa is blind to off-diagonal association, when more than two categories exist.25 Overlooking the validity issues of the 1+ category of MAS and seeing isolated from a reproducibility point of view, one could argue that a difference between two raters of one rank category is a sign of better agreement compared with a difference of two or more. This partial agreement would be captured by the weighting scheme and could argue for including the weighted kappa in the evaluation alongside with the simple kappa. It also suggests that the poor reliability seen in the simple kappa is due to problems when distinguishing between values at the lower end of the scale. The fact that it is difficult to distinguish between spasticity and changes in viscoelastic properties and that a certain degree of contractures was present, would very likely be an explanation for the poor reliability.

SFS has not previously been tested for reliability, beside one study showing a crude agreement of 40% of the PSFS, which is a scale similar to the SFS. The present study showed almost perfect reliability when data are collected both by the same or different raters. As it is self-reported number of spasms one could expect a smaller difference between intra- and inter-raters, compared with a rater-based scale. Our results support this assumption.

The distribution of the SFS ratings shows the majority of ratings to be in the highest category 4 followed by the category 3 and 2. Only a minority of the answers were in the category 1 or 0, meaning a few or none spasms per day. One of the inclusion criteria was the presence of spasticity, so the low number of the value zero is not surprising, but the skew toward the high values could also be interpreted as a sign of low sensitivity of the scale as previously suggested.7 It should also be noted that the questions in SFS are not location specific. The patients understanding of what is meant by spasms could also vary, as it is not defined in the description of the scale and could be an explanation to the skew toward the higher SFS values. The distribution of MAS was inversely skewed compared with SFS. Correlation analysis between MAS ratings and SFS shows a weak, non-significant correlation in 10 of 12 ratings. The remaining 2 of 12 showed a negative, non-significant correlation. The inverse skews and poor correlation indicate that MAS and SFS examine two different aspects of spasticity. A previous study by Priebe et al.21 supports the finding of poor correlation between self-reported severity and the number of spasms and clinical examination scales (modified PSFS). The fact that it has previously been shown that 60% of people with self-reported spasticity after SCI elicit spasticity from movement3 stresses the importance of adding other measure of spasticity in addition to MAS to capture more aspects of spasticity.

As movements of the limb can affect MAS,12 we tested for the difference between the first and second rating to see whether the second rating would be lower than the first rating. The median of the difference was zero on both days, with interquartile range on the first day suggesting a lowering of the value between first and second, whereas the opposite was the case on second test day. Hence, in conclusion it does not seem to be the case that the movements systematically affected the MAS measure.

The study was designed to reflect the use of MAS in the clinical setting, and the choice of study population was chosen to reflect this. A certain degree of contractures in the SCI population is very common, and because of this a certain degree of contractures for the participants was accepted. It was chosen not to use a metronome, but as we do not have documentation on the actual speed of the movement this could be a confounder to distinguish the active part of spasticity from passive changes in viscoelastic properties.

Clonus was present in roughly a quarter of the plantar flexor measurements and were excluded from the κ-analysis, which might have affected the reliability. The skews in MAS and SFS have probably also affected the kappa scores. The rolling schedule of the raters led to uneven distribution of the number of ratings each rater performed in total, which also could be a confounder.

Regular medicine of the participant was administered at the same time of the day, but the time when the participant actually took the medicine was not documented. Time of awakening in the morning was documented as part of ensuring that the tests were the first morning activity for the participants, and they were asked whether they felt ok. The total amount or lack of sleep was not documented however.

Conclusion

The MAS is found to have acceptable reliability when partial agreement is included (weighted kappa), under the assumption of ordinality of the scale, but there are poor intra- and inter-rater reliability for exact agreement (simple kappa). Repeated measures should always be performed by the same rater when possible, as the intra-rater reliability is higher and the measurements are to be taken in a very standardized way for repeated measures. Clonus and limited range of motion, which are highly prevalent in the SCI population, complicate assessment. Together with well-known difficulties in distinguishing spasticity from passive viscoelastic changes, this is limitations to the scale. The standardization in the setting when MAS was measured for this study could have given rise to a higher reliability than what would be seen in the clinical setting where the same standardization not always is possible. The SFS is highly reliable both for intra- as well as inter-raters, which are a benefit in the clinical setting, but the scale might have low sensitivity. Self-reported spasms on SFS and clinical examination with the MAS correlate poor to one another and show inverse skews. Our conclusion is that SFS and MAS assess different constructs of spasticity.

Data archiving

There were no data to deposit.