Introduction

Spinal cord injury or disease (SCI/D) is associated with decreased quality of life (QoL) of the individuals involved [1]. QoL is therefore an important factor when evaluating outcomes that best capture the effects of rehabilitation treatments for persons with SCI. Studies on QoL following SCI/D are abundant, including objective and subjective evaluations of QoL and reflecting its multidimensional nature [2]. Results from previous research are difficult to compare, however, because of variation in definitions of QoL and the measures used [3]. In response to this situation, the International SCI QoL Basic Data Set (QoL-BDS) was developed as a three-item self-report questionnaire as part of the International SCI Data Sets Project [4, 5]. It was designed to include a minimal number of data elements, to be collected in clinical practice and to be included in any SCI/D study, in addition to the preferred QoL measure, if applicable [4]. The QoL-BDS is recommended for use in clinical practice and research by the International Spinal Cord Society (ISCoS) and the American Spinal Injury Association (ASIA), and included in the National Institute of Neurologic Disorders and Stroke Common Data Elements Project [6].

The International SCI Data Sets were not designed to be used as measures. Nevertheless, they need to be valid and reliable, across the continuum of care and internationally [7]. The first data on cross-cultural validity of the QoL-BDS from the United States and Brazil were published in 2014 [8, 9]. After that a study from The Netherlands showed validity of the three items and the option to use a total QoL-BDS score [10]. Secondary analysis of merged data from five countries (these three plus Australia and India) provided further indication of its concurrent and divergent validity [11]. Finally, a small study of inter-rater reliability of several International SCI Data Sets in an inpatient rehabilitation setting showed good inter-rater reliability of the QoL-BDS items when administered by different physicians a few days apart [12].

Based on these encouraging results, a comprehensive prospective international validation project was planned to establish the reliability, validity and responsiveness of the QoL-BDS across four countries (Australia, Brazil, Netherlands and United States). For a set of data elements to be accepted as reasonably reliable and valid, information on its reliability and reproducibility must be reported.

The current study is part of this larger project and focuses on the reproducibility of the QoL-BDS and answers the following question: What is the test–retest reliability and agreement between two points in time of the Qol-BDS, (a) in the whole sample; and (b) in subgroups with respect to age and etiology? It was hypothesized that the reliability of all three items and the total scale would be satisfactory, and that agreement would be good, in the whole sample and in these subgroups.

Methods

Design

Pre–post test. Repeated administration of the QoL-BDS to persons with SCI/D living in the community within a period of 14 days between the first test (T1) and the second (T2).

Participants

We aimed to include a total of 80 participants from five study sites: the University of Michigan (UM) in Ann Arbor, Michigan; Craig Hospital in Englewood, Colorado; De Hoogstraat Rehabilitation in Utrecht, The Netherlands; Caulfield Hospital in Melbourne, Australia; and Hospital das Clínicas in Sao Paulo, Brazil. Inclusion criteria were: at least 18 years of age at the time of the study, living with SCI/D for at least 1 year and not having substantial cognitive or psychiatric problems. We aimed for a balanced sample with respect to age (< 50 or ≥ 50 years) and etiology (SCI or SCD).

Procedures

Convenience sampling was performed at each site. Participants were recruited from people visiting the hospital for their regular follow-up, from the hospital’s medical files, registries, and from the community or a combination of these. Potential participants received written and oral information about the study and signed written informed consent. The study was approved at all sites by their respective IRBs and Ethic Committees. The QoL-BDS was administered by the same rater in an oral interview or telephone interview during two time periods (T1, T2).

Instruments

QoL-BDS

The QoL-BDS fits the definition of subjective QoL, reflecting an individual’s overall perception of and satisfaction with how things are in his/her life [13]. It includes three items on the individual’s satisfaction with their life as a whole, physical health, and mental health. All items use a time frame of the past four weeks and a 0–10 numerical rating scale with higher scores indicating better QoL [4].

Including a fourth item for the domain, social life in the QoL-BDS was considered at the time of its development. It was decided not to include such an item because a separate International SCI Activities and Participation Data Set was developed simultaneously [14]. However, as described in greater details elsewhere, cognitive interviews conducted with participants from all sites as part of an earlier phase of the current project made clear that the QoL-BDS would be incomplete without such an item [15]. Therefore, a fourth item to rate satisfaction with social life, with the same time frame and response scale, was asked in addition to the QoL-BDS.

The QoL-BDS was developed in English and this version was used in the USA and Australia. It had already been translated into Dutch and Brazilian Portuguese for use in previous studies [8,9,10], following the recommendations of the International SCI Data Sets project [7].

Demographic and injury/disease data questionnaire

In addition to the QoL-BDS, demographic and injury/disease-related questions were also asked at the first administration. Demographic information included age (in years), gender, marital status, education (years of formal schooling), and employment status. Questions on SCI/D characteristics included date of onset, etiology, level, and completeness. Since many participants could not indicate the completeness of their SCI/D, a question on the degree of voluntary movement below the level of the lesion (response categories: none, some, full) was used as a proxy measure for lesion completeness.

Analysis

Age was dichotomized as up to 50 or 50 years or older. Etiology was dichotomized into non-traumatic (SCD) or traumatic (SCI), marital status into married or not married, and employment status into employed or not employed. Differences between QoL-BDS item and total scores with respect to age and etiology were tested using the Mann–Whitney test. Differences between QoL-BDS scores at t1 and t2 were tested using the Wilcoxon Matched Pairs test.

Internal consistency reliability was examined using Cronbach’s alpha and inspection of corrected item-total correlations. For group comparisons, an alpha of at least 0.70 is “sufficient”, an alpha of 0.80 or higher is “good” and an alpha of 0.90 is “excellent”. Corrected item-total correlations should be at least 0.40 [16].

Agreement of QoL-BDS scores at T1 and T2 was examined using weighted Kappa for single items and intra-class correlations (ICC) for the total scores. A weighted kappa of 0.21–0.40 is considered fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect [17]. For the total scores, the two-way random effects model for absolute agreement was used. An ICC above 0.70 is “sufficient” and above 0.80 is ‘good’ [18].

Bland–Altman plots were used to analyze agreement of total QoL-BDS scores at T1 and T2 related to the mean scores of T1 and T2 in the whole sample. The “limits of agreement” were computed, defined as ± 1.96 SD of the difference score. This figure indicates the minimum difference between scores exceeding chance at an individual level [19]. Similarly, the limits of agreement at group level were calculated as ± 1.96 times the standard error of the difference score (SE = SD/√N). To express both figures in terms of effect size, both were divided by the SD of the baseline score. Cohen’s approach was used to interpret these effect sizes: 0.2 is “small” 0.5 is “moderate”, and 0.8 is “large” [20]. Finally, we visually inspected the Bland–Altman plots for possible bias, meaning an association between the differences between the two scores and the mean of the two scores.

All analyses except the Bland–Altman plots were performed for the individual item scores and the three-item and four-item total scores of the QoL-BDS. Results are presented for both total scores to evaluate the impact of adding a fourth item and to facilitate comparisons with other studies. Subgroup analyses were performed with respect to age (up to 50 years vs. 50 years or older) and etiology (SCI vs. SCD). Because of the low sample sizes per country, no country-specific analyses were performed.

Results

Between 15 and 19 participants per site were included, for a total of 83. Four participants’ data were excluded after collection because their time between tests was outside of the established window, making the final sample 79.

The time between T1 and T2 was between 4 and 27 days (median 14 days, interquartile range 11–15 days) in the whole sample. Participant characteristics are displayed in Table 1. Sample composition varied substantially across sites. Three sites included predominantly males with SCI injured as young adults, whereas two other sites included more females, more often with SCD and injured at a higher age. Most participants had long-standing SCI/D (median 10 years). Most had paraplegia and 40.5% indicated no voluntary movement below the level of injury.

Table 1 Characteristics of the study group (N = 79)

At T1, the distributions of the total scores were approximately normal for both the three-item score (Skewness − 0.28, Kurtosis − 0.54) and the four-item score (Skewness − 0.24, Kurtosis 0.59). The distributions of the QoL-BDS item and total scores are displayed in Table 2. The score distributions of the three-item and the four-item total scores were similar.

Table 2 Descriptive statistics QOL-BDS by age and etiology

There were no significant differences in QoL-BDS-scores with respect to etiology or age group. Scores at T2 were generally slightly higher than at T1, but this difference was only significant (p < 0.01) for the item that asks for ratings in relation to life as a whole (Table 2).

Cronbach’s alpha values were good (range 0.84–0.86) for both total scores at both test occasions. All subgroup analyses also showed good alphas (range 0.82–0.90). All but one corrected item-total correlation were at least 0.40, with the social life item in the SCD at T1 (0.35) being the exception.

Test–retest reliability of all QOL-BDS scores are displayed in Table 3. The weighted Kappa values of the four single items in the whole sample were all substantial. The ICC values of the three-item and four-item total scores were identical and good (both 0.83; 95% CI: 0.75–0.89). Subgroup analyses showed only one item with a weighted Kappa value below 0.60 (physical health in the younger age group), and satisfactory to good ICC values of the two total scores in all subgroups.

Table 3 Test–retest reliability QOL-BDS items and total scores by age and etiology (mixed model; absolute agreement)

Agreement of the three-item and four-item total scores at T1 and T2 are displayed in Table 4 and Fig. 1a, b. Visual inspection of these plots suggested no bias for the three-item total score. However, a slightly decreasing trend with increasing mean scores is found in the plot, suggesting some bias for the four-item total score. The limits of agreement of the two scores were similar, both showing that large score differences (effect size > 1.1) are needed to exceed these limits of agreement and thereby indicate change beyond random chance at the individual level. In contrast, small score differences (effect size > 0.1) are sufficient to show change beyond chance at group level.

Table 4 Bland–Altman analysis of agreement between QOL-BDS scores on T1 and T2
Fig. 1
figure 1

a, b Bland–Altman plots of the three-item and four-item scores

Discussion

This is the first prospective, multicenter, international psychometric evaluation of the QoL-BDS. The results provide evidence of its reproducibility in an international sample and in subgroups based on age and etiology of SCI/D. Internal consistency of the three-item and four-item total scores were good. Test–retest reliability was substantial for the four single items and good for the three-item and four-item total scores. The Bland–Altman analyses showed that the QoL-BDS is sensitive to small changes in QoL at the group level, but not at the individual level.

The addition of a fourth item on satisfaction with social life did not impact the reproducibility of the QoL-BDS. Also, the score distributions of the three-item and four-item versions were remarkably similar. We did not formulate a-priori hypotheses about possible differences, but it is reassuring that the extra item did not increase the heterogeneity of the measure. Future studies should clarify whether or not the extra item on satisfaction with social life increases the validity of the QoL-BDS.

The ICC values for the individual items in this study (0.66–0.80) were lower than the range of ICC’s of 0.86–0.94 found in a Dutch study of rehabilitation inpatients [12]. The short time between tests in the latter study (median 4 days) could explain the higher reliability found in that study. We could not find other studies to compare our results with.

The Bland–Altman analysis showed that the QoL-BDS is likely to be sensitive to change when used in trials or longitudinal cohort studies because the analysis would be conducted at a group level. The Bland–Altman analysis, however, also showed that large changes in scores are required to exceed chance at individual level. This implies that the QoL-BDS as currently designed might not be sufficiently sensitive to change in clinical practice. Repeated administrations would be recommended to increase its sensitivity to change in clinical practice. The SPSS-output showed that the ICC of the average of the two test administrations would be 0.91, which is above the recommended 0.90 required for use in individual patient care [16].

Limitations

A few limitations of this study should be noted. First, the sample size per country was small and below the recommended sample size of 30 [7]. Therefore, we refrained from comparing test–retest reliability between countries. In the forthcoming full validation analyses, assessment of differences by country will be a focus.

Also, for five participants the test–retest interviews were not administered by the same interviewer. This, however, hardly influenced the results: for example, the ICC of the four-item total score increases only from 0.83 to 0.84 after exclusion of these five participants.

Finally, although data were collected in four countries in different parts of the world, there was no representation of low-income or lower–middle income countries and countries from Africa, the Middle-East or Asia. It also remains unclear whether the test–retest reliability results of this study extend to the inpatient situation.

Conclusion

This study provides evidence of reproducibility of the QoL-BDS and suggests sensitivity to change in research investigations. The possible addition of an item on satisfaction with social life did not affect the internal consistency and reproducibility of this measure, and conceptually adds a dimension found to be important by persons with SCI. In the context of the continuing debate on the conceptualization and measurement of QoL, the QoL-BDS is a significant step toward unifying our ability to record and report this important information.

Data archiving

The data sets analyzed during the current study are available from the corresponding author on reasonable request.