Introduction

The International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set (ISCIMSBDS) aims to cover the most important musculoskeletal (MSK) problems that affect people with SCI.1 The ISCIMSBDS is one of the several International SCI Data Sets that was developed under the umbrella of the International Spinal Cord Society and the American Spinal Injury Association in order to standardize data collection. This is important for improving the examination, treatment, rehabilitation and prevention of SCI and for facilitating comparison of results across SCI centers and countries for research.2 The ISCIMSBDS form can be found in Appendix A.

MSK problems are common in people with SCI and include problems such as spasticity, fractures, heterotopic ossification (HO) and contractures. For example, 60–70% of people with SCI develop spasticity within a year, and about half of these receive antispastic medication.3, 4, 5 In addition, age-related MSK problems are increasing compared with able-bodied persons.1, 6 The incidence of fractures ranges from 1% to 34%.7 The relative risk for a fracture is doubled compared with controls and in particular with a much higher risk of lower extremity fractures and fragility fractures (low energy fractures) in individuals with SCI compared with controls.8 Risk of fracture increases with more severe motor impairment.9 There is no accurate data on the incidence of HO, although it is estimated that between 10% and 53% of people with SCI develop HO.10, 11 The incidence of contractures in major joints 1 year after SCI was found to be 11–43%, with the ankle, wrist and shoulder being most commonly affected.12 Contractures are a common and a disabling problem for individuals with SCI and a challenge to manage for clinicians.13 Degenerative changes or overuse injuries are most often located in the upper extremities, particularly the shoulders, elbows and wrists, as well as the neck, upper and lower back.1 Nearly all individuals develop scoliosis if they sustain their SCI at a young age.14 It is important to capture all these MSK problems in people with SCI. The ISCIMSBDS was designed for this purpose. However, it is important to determine its reliability and validity. The objective, therefore, of this study was to determine the intra- and inter-rater reliability, as well as discuss the content validity of the ISCIMSBDS.1

Material and methods

Study design

The study was designed as a test–retest reliability study. Two measures of reliability were performed: intra- and inter-rater reliability. Intra-rater reliability describes how well the same rater can reproduce the data twice on the same group, whereas inter-rater reliability describes reproducibility when two different raters perform the data collection. The study was carried out at four SCI centers located with one in each of the four continents: Australia, India, United States of America, and United Kingdom. Each center recruited 30 participants with SCI, giving a total of 120. Participants were enrolled from April 2013 to March 2014. Participants were included if they were >18 years of age and had sustained their SCI at least 6 months prior. Participants were included regardless of the level or etiology (traumatic or non-traumatic) of the SCI. Participants could have any number or severity of MSK symptoms provided they were stable and were not expecting changes in physical therapy or medication for pain or spasticity between interviews. Participants were recruited from a sample of convenience and included both inpatients and outpatients. All were recruited by personal contact (none were recruited by letter or phone). The setting was either hospital or SCI clinic. One center also recruited from a local residential home for people with SCI and another center recruited from a SCI summer camp (EmpowerSCI, Inc.). Three participants were excluded post hoc because they did not meet all inclusion or exclusion criteria. Consequently, 57 people participated in the intra-rater reliability and 60 in the inter-rater reliability aspect of the study.

Each study site had two raters. All raters were experienced SCI health-care professionals (physiotherapists and medical doctors). The first rater performed all the intra-rater tests. Inter-rater (inter-observer) reliability was tested by two different raters of which one was the same rater who performed all the intra-rater tests. The ISCIMSBDS was completed by patient interview and, where necessary, a review of patients’ medical records and through a physical examination. The latter was often the case when evaluating contractures, degenerative joint changes and scoliosis. This relatively unformalized way of data collection reflects the way the ISCIMSBDS will be used in the clinical and community settings.

Content validity was evaluated by focus group interviews including health professionals and consumers, thus using recognized subject matter experts from different domains, to evaluate to what extent the variables of the ISCIMSBDS adequately reflect the content domain15, 16 and whether the wording of the variables was appropriate. The health professionals were those involved in SCI management and would hence be potential users of the ISCIMSBDS in clinical practice. Consumers with SCI were recruited from the Indian Spinal Injuries Centre to form three focus groups, each with four individuals. They were aged between 26 and 50 years and included both females and males and were at least 6 months after injury. Group interviews were performed at the Indian Spinal Injuries Centre in New Delhi. The study was explained to all participants in the four groups. The discussions were facilitated and moderated by one of the investigators. The comments on the relevancy of each item in the ISCIMSBDS were compiled from each group separately, and a final consensus was achieved from each group. At the end, the panel of three investigators came to a final consensus on the data set. A total of seven discussions (four with consumers groups and three with the expert group) were conducted. Duration of each discussion was between 1.5 and 2 h. The discussions were conducted in English, and all the experts, consumers and investigators were fluent in English.

Statistical analysis

Cohen’s Kappa (κ) was used to determine reliability because it provides an estimation of agreement corrected for chance. However, Cohen’s Kappa is influenced by the prevalence (frequency) of conditions and systematic bias; hence, crude (percentage) agreement was also determined.17 Data from all four centers were pooled, as all included participants met the same inclusion criteria and all raters were representative of those who will use the ISCIMSBDS in the clinical setting. A κ-value across all centers was calculated for intra- and inter-raters, respectively.

κ-Values were interpreted based on Landis and Koch, 1977,17 where a score <0 reflected poor agreement, 0.0–0.20 reflected slight agreement, 0.21–0.40 reflected fair agreement, 0.41–0.60 reflected moderate agreement, 0.61–0.80 reflected substantial agreement and 0.81–1.00 reflected almost perfect agreement. κ-Values >0.61 (reflecting at least substantial agreement) with a crude (percentage) agreement of >90% were considered satisfactory.18

Frequency of the MSK problems was calculated as the mean value of the two raters’ recordings. The data set consists of variables with main questions that are answered by ‘yes’ or ‘no’. If these questions are answered as ‘yes’, then subcategory questions are answered. Agreement was only calculated for subcategory questions if both raters indicated ‘yes’ on the main question. Instances with missing data were excluded from the analysis (N used in analysis is shown in Tables 2A and 2B).

Agreement of the categories titled ‘Fractures’, ‘HO’, ‘Contractures’ and ‘Degenerative changes/overuse’ was first calculated for each possible location existing in the data set,1 and then all locations were summed in a 2 × 2 table for κ-analysis. Both location and side needed to be the same for the two answers in order to be considered as an agreement. ‘Fractures’ and ‘Degenerative changes/overuse’ had 28 locations and ‘HO’ and ‘Contractures’ had 16 locations to choose from. This gave a total N of 1597 (28 × 57) for intra-rater analyses and 1680 (28 × 60) for inter-rater analyses for the variables ‘Fractures’ and ‘Degenerative changes/overuse’. ‘HO’ and ‘Contractures’ had 912 (16 × 57) and 960 (16 × 60) possible locations for intra- and inter-rater analyses, respectively.

Data collection and data management were carried out with OpenClinica,19 which is an open source web-based software platform for managing clinical research.

Statistical analyses were calculated using the SAS statistical software version 9.4 for Windows (SAS Institute Inc., Cary, NC, USA) and IBM SPSS Statistics version 22 for Windows (IBM Corp., Released 2013, Armonk, NY, USA).

Statement of ethics

We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during the course of this research, and the necessary approvals were obtained in each center. OpenClinica used for data collection in this study is designed to support regulatory guidelines such as 21 CFR Part 11.20

Results

Demographics

The characteristics of participants are listed in Table 1.

Table 1 Characteristics of participants

The mean (s.d.) time between interviews was 8.7 (3.3) days (median 7, interquartile range 7–11 days).

Frequency of symptoms

Frequency of symptoms in the study sample for the intra- and inter-rater groups is shown in Figure 1.

Figure 1
figure 1

Frequency of symptoms in the order of highest to lowest frequency. 1: ‘Presence of spasticity/spasms’; 2: ‘Treatment of spasticity/spasms within 4 weeks’; 3: ‘Do any of the above musculoskeletal challenges interfere with activities of daily living (transfers, walking, dressing, showers, etc.)?’=’Yes, a little’ OR ‘Yes, a lot’; 4: ‘Contractures’ (1); 5: ‘Scoliosis’; 6: ‘Fractures since spinal cord lesion’ (1); 7: ‘Other musculoskeletal problems’(yes/no); 8: ‘Degenerative changes/overuse’ (1); 9: ‘Heterotopic ossifications’ (1); 10: ‘Neuro-musculoskeletal history before spinal cord lesion’.

Neuro-musculoskeletal history before spinal cord lesion

Frequency of ‘Neuro-Musculoskeletal history before spinal cord lesion’ was low for all of the three categories (Figure 1).

Two participants (3%) from the intra-rater group had ‘Preexisting congenital deformities of the spine and spinal cord’. Crude agreement was 100% for the subcategory questions for these two participants, and there was 100% agreement for diagnosis, location, surgery and date of surgery. There were no reported ‘Preexisting congenital deformities’ for the inter-rater group.

One participant in the intra-rater group and three participants in the inter-rater group had ‘Preexisting degenerative spine disorders’. Intra-rater crude agreement was 98% (56/57), and inter-rater agreement was 98% (58/59). All raters agreed on the subcategories titled diagnosis, location, previous surgery and date for the few cases where the condition was present.

No participants in either groups had ‘Preexisting systemic neuro-degenerative disorders’, and thus for both groups there was 100% agreement on the absence of symptoms (Tables 2A and 2B).

Table 2A Intra-rater reliability for the variables in the MSK data set
Table 2B Inter-rater reliability for the variables in the MSK data set

Presence of spasticity and treatment of spasticity

‘Presence of spasticity/spasms’ was reported in 78–81% of the study sample. Half of all participants received ‘Treatment for spasticity/spasms’ within the past 4 weeks. There was almost perfect intra- and inter-rater reliability for the ‘Presence and treatment of spasticity’ (Tables 2A and 2B).

Fractures

Fractures were located in the lower body with most in the ‘Hip/femur’, followed by ‘Tibia/fibula’, ‘Knee’, ‘Foot’ and ‘Ankle’. The only fractures reported for the upper body were in the ‘Hand’. Both intra- and inter-rater reliability were almost perfect (the high crude agreement reflects the many locations where no symptoms were reported and hence agreement on the absence of a fracture).

The intra- and inter-raters agreed on the year of the fracture in 77% and 76% of cases, respectively. Out of these, 50% and 38% also agreed on date and month. Median (interquartile range) time since the fracture was 6 years (2–31) in the intra-rater group and 11 years (5–14) in the inter-rater group.

Intra-raters classified 25% of the fractures as a ‘Fragility fracture’ and inter-raters 63% of the fractures. Intra-rater reliability was satisfactory (Table 2A), but inter-rater reliability was unsatisfactory (Table 2B).

Heterotopic ossifications

‘HO’ was only reported for the ‘Hip/femur’, with one disagreement about HO for the knee. X-rays were used four times to document HO, and computed tomography+magnetic resonance imaging were used one time for the intra-rater group and all agreed. In the inter-rater group, it was agreed twice that X-ray was used and disagreed one time between X-ray and Triple-phase bone scan (it was not possible to determine whether both were performed).

Contractures

‘Contractures’ were reported in all locations with most reported for the ‘Hip’, ‘Knee’ and ‘Ankle’. Reliability was satisfactory for both intra- and inter-rater groups. Intra-rater reliability for each location ranged from substantial to almost perfect (Table 2A), and all locations were reported. Inter-rater reliability ranged from moderate to almost perfect (Table 2B). In this group, the lowest reliability was reported for the ‘Hip/femur’ and ‘Knee’ location.

Degenerative changes or overuse

There were a high number of recordings for ‘Degenerative changes or overuse’ for the upper body and spine from both the inter- and intra-rater groups, with the ‘Shoulder’ and ‘Cervical spine’ being the commonest site. There were very few or no recordings for the lower body. Similar to the situation with ‘Fractures’ and ‘Contractures’, there was high agreement on the absence of ‘Degenerative changes or overuse’ for all locations in both groups, but when raters identified the presence of ‘Degenerative changes or overuse’ there was considerable disagreement about the precise location for the inter-rater group. This led to a summed κ-score below satisfactory level (Table 2B). There were no clear patterns between the locations and reliability in the inter-rater group other than that ‘Lower back/lumbar spine’ had the lowest agreement in both groups.

Scoliosis

‘Scoliosis’ showed almost perfect reliability for both intra- and inter-rater reliability (Tables 2A and 2B). Of the method of assessment of scoliosis, ‘Plain radiographs in sitting’ had almost perfect inter-rater reliability, whereas ‘Observation in sitting’ and ‘Plain radiographs in standing’ had poor inter-rater reliability. The option ‘Observation in standing’ was not used at all.

There was perfect agreement in both groups for ‘Surgical treatment of scoliosis’. ‘Date of surgery’ of scoliosis was agreed upon in three of the four cases for intra-rater testing. Date was only recorded once by one rater for inter-rater testing corresponding to no agreement.

Other musculoskeletal problems; specify

Intra-rater reliability was satisfactory (Table 2A), but inter-rater reliability was just below satisfactory level (Table 2B). The ‘specify’ answers are listed in Table 3. The most frequently reported problem was related to pain (>30%). The others were tendon injuries, tendonitis, tendon-related surgery, osteomyelitis, osteoporosis, spinal stenosis, herniated discs, amputations and alloplastic surgery.

Table 3 All answers from specifications of ‘Other musculoskeletal problems’

Do any of the above musculoskeletal challenges interfere with your activities of daily living (transfers, walking, dressing, showers, etc.)?

Intra-rater reliability was κ=0.68 (Table 2A) and inter-rater reliability was κ=0.59 (Table 2B). If the two categories ‘yes, a little’ and ‘yes, a lot’ were merged into one category—‘yes’—this yielded an intra-rater reliability of κ=0.74 and an inter-rater reliability of κ=0.65.

Content validity

All feedback from the validation group interviews is shown in Table 4. There were no major suggestions for changes.

Table 4 Feedback from validation discussions of health-care professionals, as well as groups of individuals with spinal cord injury

Discussion

The ISCIMBDS has satisfactory intra-rater reliability for all variables, except the variables titled ‘Date of fracture’ and ‘Method of documentation of HO’. Not unexpectedly, reliability scores were higher for the intra-rater than the inter-rater group. Inter-rater reliability had satisfactory reliability in 9 out of 12 of the main variables, but the agreement was largely unsatisfactory for the sub-questions. As different clinicians will be using this data set, agreement between raters is important. The following variables had unsatisfactory inter-rater reliability: ‘Date of fracture’, ‘Fragility fractures’, ‘Degenerative changes/overuse’, ‘Scoliosis, method of assessment’, ‘Other musculoskeletal problems’ and ‘Do any of the above musculoskeletal challenges interfere with your activities of daily living (transfers, walking, dressing, showers, etc.)?’. These variables will be discussed in further detail.

Reporting of fractures showed good reliability. ‘Fractures since spinal cord lesion’ could be rephrased to ‘Fractures since spinal cord injury’ to follow the terminology in the data set. The date of fracture was below satisfactory level, and the day, month and year of fracture were only fully reported in 50% of instances for intra-raters and 38% for inter-raters when there was agreement of the year. When revising the data set, we suggest that only year of fracture is recorded. Agreement on fragility fractures between raters was unsatisfactory. This may reflect difficulties determining the cause of fractures, which in most cases occurred many years prior to assessment (median time 6 and 11 years for intra- and inter-raters, respectively).

Inter-rater reliability was unsatisfactory for ‘Degenerative changes’ or ‘Changes due to overuse’. This probably reflects a need to better define these variables in the data set. Pain and discomfort, which are common symptoms of degenerative changes or overuse, could cause differences between raters' interpretation of the variable.1 Pain owing to degenerative changes or overuse can be difficult to distinguish from other types of pain such as neuropathic or visceral pain—a more detailed pain evaluation is covered in the International SCI Pain Basic Data Set.21 The individuals in this study could suffer from overuse-induced pain in the upper body with extended wheelchair use, as the majority of the study population had American Spinal Injury Association Impairment Scale A, B and C and a high number of cervical lesions (Table 1). The locations of the degenerative changes or overuse were primarily in the upper body with only a few instances in the lower body.

Scoliosis showed almost perfect agreement, but there was only moderate reliability for the variable relating to the method of assessment. The option ‘Observation in sitting’ had the lowest reliability, and the option ‘Observation in standing’ was not used at all. The validation group suggested removing the sub-questions, which results from this study support. Otherwise the options could be reduced to, for example, ‘Observation’ and ‘Radiography’.

‘Other musculoskeletal problems’ had unsatisfactory inter-rater reliability, suggesting that this variable is a challenge to interpret. Some of the MSK problems reported (Table 3) could belong to the ‘Degenerative/overuse category’. This variable had also moderate reliability, suggesting disagreement between raters regarding which symptoms should be listed in these two categories. The last variable ‘Do any of the above musculoskeletal challenges interfere with your activities of daily living (transfers, walking, dressing showers, etc.)?’ showed unsatisfactory inter-rater reliability but improved to substantial when ‘yes, a little’ and ‘yes, a lot’ were merged into one category. This adjustment could be considered when revising the data set. Very few raters indicated that participants had any neuro-muscular history prior to SCI. This was captured in responses to the first variable in the data set, ‘Neuro-Musculoskeletal history before spinal cord lesion’. Following this, it is tempting to suggest removing this variable from the data set. However, we believe that this variable is important to retain because prior neuro-muscular problems may become more frequent in the future, as SCI becomes more common in the elderly. These people are likely to have MSK problems, such as spinal canal stenosis or spondylosis.

Contractures had overall satisfactory reliability, but inter-rater reliability was only moderate for the location of contractures in the lower extremities. This result probably reflects the differences between raters in their diligence when measuring range of motion.

Limitations of the study include the low frequency of reported disorders for some variables, meaning that agreement primarily reflected the absence of symptoms. Therefore, it is difficult to make any conclusions about these variables.

The study populations differed across the four centers with regard to their demographics, and there is a risk of selection bias if the populations were not representative. For example, the frequency of HO was lower in this study than reported in the literature.10, 11 Selection bias could also have arisen because of the recruitment procedure. Content validity was not tested statistically or compared with a golden standard because there was no gold standard to use, and the focus group discussions were also performed in a relatively small group of people.

Conclusion

Overall, the data set has acceptable reliability. Intra-rater reliability was satisfactory, and inter-rater reliability was satisfactory in 9 of the 12 variables for the main questions but largely unsatisfactory for many sub-questions of the variables. The variables ‘Date of fracture’, ‘Fragility fractures’, ‘Scoliosis, method of assessment’, ‘Other musculoskeletal problems’ and ‘Do any of the above musculoskeletal challenges interfere with your activities of daily living (transfers, walking, dressing, showers, etc.)?’ may need revising in the next version of the data set. The frequency of reported problems was low for some variables, making final conclusions more difficult as agreement was primarily based on the absence of symptoms. Validity discussions suggested only minor changes to a number of variables.

Data archiving

There were no data to deposit.