Introduction

The Spinal Cord Independence Measure [1] and the Functional Independence Measure [2] scores (SCIM/FIM) are often used as outcome measures for cohort studies and clinical trials designed to determine the effectiveness of different novel experimental interventions including pharmacological, biological, technological and other emerging interventions [3]. However, there is always the concern in these types of studies that SCIM/FIM are not only influenced by the experimental intervention but also by standard care, and particularly by the various physiotherapy interventions commonly administered. If this is the case, then in cohort studies, the type and amount of physiotherapy could confound the causal relationship between the experimental intervention (exposure) and the SCIM/FIM (outcome). For this reason, physiotherapy interventions would need to be controlled for in the design and/or analysis of the study. The same would need to occur in clinical trials that use any analysis apart from an intention-to treat analysis (for example, a Complier Average Casual Effect Analysis; see Supplementary File 1 for a more detailed explanation of these issues with accompanying Directed Acyclic Graphs).

To determine whether physiotherapy interventions could confound the causal relationship between novel experimental interventions and SCIM/FIM, we wanted to determine whether any physiotherapy interventions increase SCIM/FIM [3]. We acknowledge that understanding the effect of physiotherapy interventions on SCIM/FIM is only half of the equation because to be a confounder these interventions need to also determine (or in part determine) whether a person receives the novel experimental intervention. Nonetheless understanding the effect of physiotherapy interventions on SCIM/FIM is an obvious first step to understanding whether any of these interventions could be a potential confounder in cohort studies and clinical trials involving novel experimental interventions. And irrespective of this, if physiotherapy interventions increase SCIM/FIM, they can increase the imprecision of treatment effects in cohort studies and clinical trials. Standardising physiotherapy would therefore provide one way to increase the precision of estimates without the need to increase the sample size. Therefore, the aim of this systematic review was to determine the effect of physiotherapy interventions on SCIM/FIM in people with SCI. Importantly, we did not set out to determine the effect of physiotherapy interventions on function per se. Our focus was specifically on SCIM/FIM to guide the design of cohort studies and clinical trials, which typically rely on SCIM/FIM as outcome measures.

Methods

Searches were conducted from inception to May 2020 of the following databases: Embase, Medline and the Cochrane Central register of controlled trials (all via the Ovid search engine). The Cochrane search strategy for identifying clinical trials [4] was combined with variations on the following terms to capture trials involving people with SCI: paraplegia, quadriplegia, tetraplegia, wheelchair and spinal cord; and to capture SCIM or FIM: spinal cord independence measure, SCIM, functional independence measure, FIM (wildcard characters were used to identify variations on these terms). Adjustments were made for each database and lines were added to the search strategy to exclude animal studies (see Supplementary File 2 for details of the search strategy). In addition, we scanned our own database of previously identified randomised controlled trials involving any physiotherapy interventions administered to people with SCI [5]. This database included our search results of the PEDro database for all physiotherapy interventions applied to people with SCI. In addition, our database includes trials that the authors have incidentally found over the years through many different sources including word-of-mouth. The titles and abstracts were independently screened by two people (LH, JC) and full papers of potentially eligible trials were then retrieved and again screened by the same two people. Disagreements were resolved by discussion and arbitration by a third person (JG). Trials that were not published in English were not considered for inclusion.

The inclusion criteria were as follows:

Participants

Only trials that included adults with a traumatic or non-traumatic SCI were included irrespective of time since injury. If trials included participants with conditions other than SCI, they were only included if 80% or more of the participants had a SCI.

Intervention

Trials were included if they involved a typical physiotherapy intervention such as some type of gait training, any form of exercise, passive interventions such as stretch or passive movements and hand therapy. Trials that also included surgery, pharmacological or psychological interventions were only included if these interventions were provided to the control and experimental groups in exactly the same way. Trials examining cranial or epidural stimulation or acupuncture were not included.

Comparator

Trials were included if they compared:

  1. (i)

    a physiotherapy intervention with a sham or no intervention

  2. (ii)

    a physiotherapy intervention with another physiotherapy intervention

Outcomes

Trials were only included if any version of SCIM and/or FIM were used as outcome measures. In trials that measured both SCIM and FIM, SCIM results were extracted and used for the primary analyses, but FIM results were also extracted for secondary analyses undertaken to determine whether our conclusions were robust to our preferential choice of SCIM over FIM. The SCIM/FIM results could be expressed in any way including total scores, sub-scores or scores of individual items but preference was given to total scores. If scores were provided of more than one item of the SCIM or FIM but a total score was not provided (see for example [6]), these scores were not tallied but instead data from the item reflecting the biggest treatment effect was extracted in the knowledge that this would bias the systematic review in favour of demonstrating a treatment effect. In trials that measured outcomes on more than one occasion, the outcomes measured at the first endpoint after the last treatment were used.

Trial design

Only randomised between-group and cross-over controlled trials were included.

Data extraction and synthesis

Data from each trial were extracted onto an Excel spreadsheet that had been designed and tested for the purpose. One author (LH) extracted the descriptive data including the design of the study, sample size, types of participants, the intervention and comparator and the details of the items of the SCIM and/or FIM that were measured. These data were then checked by a second author (JC). Two authors (LH and JG) then independently extracted the SCIM/FIM data and the third author (JC) arbitrated any differences. On the one paper that LH was an author, JC and JG independently extracted the SCIM/FIM data.

The appropriate SCIM and/or FIM data were extracted in order to determine mean between-group differences and 95% confidence intervals (95% CI). If necessary, software (Pixelruler® [7]) was used to convert distances on graphs to SCIM and/or FIM scores. Preference was given to extracting mean (95% CI) between-group differences of post data adjusted for baseline scores (this requires assuming a common SD calculated using Review Manager: see section 6.5.2.3 of Review Manager [4]). If these data were not reported then mean (SD) change data were extracted (these data were not used in meta-analyses in which the results were expressed as Standardised Mean Differences (SMD) unless all included trials expressed results in the same way; see Section 10.5.2 of Review Manager [4]). As a last resort, mean (SD) post-intervention scores were extracted. In one trial [8], post data were extracted even though the trial also provided change data to enable pooling across trials with a SMD. For trials which only provided medians and interquartile ranges (IQR), the median was used as a mean, and the SD was estimated as the IQR divided by 1.35 [4]. There were only two cross-over trials and both only provided the data pooled for each treatment irrespective of order [9, 10]. These data were used in the analyses. Standard errors, 95% CIs, p values and any other appropriate combination of data or statistical results were converted into SDs using the calculator function of the RevManager software [11]. Data that were only presented for subgroups (e.g. people with complete and incomplete lesions) were not considered useable unless the data from each subgroup were provided or unless randomisation was stratified by the characteristic defining the subgroup. For example, if authors only provided the data from a subgroup of participants with upper motor neuron (UMN) lesions and did not provide any data for those with a lower motor neuron (LMN) lesion, then the data from the subgroup with UMN were only used if randomisation was stratified by UMN and LMN lesion type. This was done to ensure the fidelity of randomisation was not lost. Where data for all subgroups were provided separately, they were combined using the STATA command of Stata v16 (see section 6.5.2.10 of Review Manager) [4].

The results of trials with similar comparisons were pooled through meta-analyses, provided there was no excessive clinical or statistical heterogeneity. The following issues were considered when making decisions about clinical heterogeneity: type of participants, type and intensity of the intervention and the design of the trials. Decisions about statistical heterogeneity were based on the I2 statistic: trials were not pooled if the I2 was >75% [4]. A random-effects model was used for all meta-analyses using RevMan v5.3 [11] and results were expressed as SMD to accommodate the differences in the reporting of the FIM and SCIM.

The PEDro rating scale was used to assess the risk of bias of all included trials. This scale rates the following ten items as satisfied or not satisfied: Random allocation, Concealed allocation, Baseline comparability, Blinding of subjects, Blinding of therapists, Blinding of assessors, Adequate follow-up, Intention-to-treat analysis, Between-group comparisons, Point estimates and variability. One author rated all the trials, a second author retrieved the PEDro scores from the PEDro website (the PEDro website is unique in that all trials included are provided with a score on the PEDro scale) and compared them to the first author’s ratings and scored any trials missing on the PEDro website. A third author resolved any disparities between the first author and the PEDro website or second author.

Results

A total of 735 papers were retrieved from the searches. There were 429 papers after duplicates were removed (see Fig. 1). Twenty-four trials met the inclusion criteria. Another nine were identified from our own database. Ultimately, 33 trials [6, 8,9,10, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40] met the inclusion criteria but only 27 provided useable data and hence were included [6, 8,9,10, 12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 35, 38,39,40] (See the Supplementary File 3 Table 1 for the details of the six trials that did not provide useable data; in brief none of these trials reported a mean between-group difference or data to derive the this value and its 95% CI).

Fig. 1
figure 1

Flow chart.

Table 1 Details of the included trials.

The number of participants in the trials ranged from 7 to 116 with a median (interquartile range) of 30 participants (21–44) (see Table 1). Twenty-five of the 27 trials had a between-group design and two had a cross-over design [9, 10]. Two of the trials compared three groups [35, 40] and the other trials compared two groups.

Sixteen of the trials only measured SCIM [6, 8, 9, 15,16,17, 19,20,21,22, 25, 27, 29, 38,39,40], eight trials only measured FIM [10, 12, 13, 18, 26, 28, 30, 35] and three trials measured both SCIM and FIM [14, 23, 24]. The risk of bias as per each PEDro item is provided in Figs. 24. The median (IQR) PEDro score was 6.0 (4.0–7.0). The commonest sources of potential bias were failure to conceal allocation and failure to blind participants, therapists and assessors.

Fig. 2: Forest plot of trials comparing a physiotherapy intervention with no intervention or a sham intervention.
figure 2

Either SCIM or FIM were measured. The results are expressed as Mean Differences. The results are based on post data in four trials [18, 23, 26, 28], change data in five trials [14, 15, 24, 29, 40] and between-group differences of post data adjusted for baseline scores in two trials [16, 19]. In one trial, data were reported separately for two subgroups of participants [24]. These data were combined (see Methods section). The risk of bias indicates high (red) or low (green) risk of bias on each of the following PEDro items: A random allocation, B concealed allocation, C baseline compatibility, D participant blinding, E therapist blinding, F assessor blinding, G adequate follow-up, H intention-to-treat analysis, I between-group differences, J point estimates. Abbreviations: FES functional electrical stimulation, SCIM spinal cord independence measure, FIM functional independence measure, Pts points.

Fig. 3: Forest plot of trials comparing one type of gait training intervention with either another type of gait training intervention or another type of physiotherapy intervention.
figure 3

Either SCIM or FIM were measured. The results are expressed as Mean Differences. The results are based on post data in eight trials [8,9,10, 12, 22, 27, 30, 38] and change data in two trials [25, 35]. In one trial, data were reported separately for two subgroups of participants [12]. These data were combined (see Methods section). In one study [35] it was assumed that the error bars were SDs but this was not clearly stated in the paper. Abbreviations: SCIM spinal cord independence measure, FIM functional independence measure, Pts points.

Fig. 4: Forest plot of trials comparing some type of general exercise ± gait training with another type of physiotherapy intervention.
figure 4

Either SCIM or FIM were measured. The results are expressed as Mean Differences. The results are based on post data in four trials [6, 13, 21, 39], change data derived from participant-level data provided in a Supplementary file in one trial [20] and between-group differences of post data adjusted for baseline scores in one trials [17]. SCIM spinal cord independence measure, FIM functional independence measure, Pts points.

Physiotherapy versus a sham or no intervention

Eleven trials compared a physiotherapy intervention to a sham or no intervention [14,15,16, 18, 19, 23, 24, 26, 28, 29, 40]. These trials examined the effectiveness of upper limb therapy [14, 16, 23, 24], gait training [19, 28], general exercise [15, 26, 29, 40] and arm positioning in bed [18] (see Fig. 2). Another two trials examined upper limb therapy and were sufficiently similar to consider pooling (FES for upper limb training) [23, 24]. However, the results of these two trials could not be pooled in a meta-analysis because they used different components of SCIM scores necessitating the use of a standardised mean difference but one trial provided post data [23] and the other trial provided change data [24]. Hence the results of all 11 trials are presented individually.

Two of the 11 trials indicated a statistically significant treatment effect [29, 40]. One trial compared a “sitting pivot transfer exercise program” and usual therapy with usual therapy alone for people who were wheelchair dependent for at least 6 months [29] and the other trial compared “advanced weight-bearing mat exercises” and electrical stimulation with no intervention in people with paraplegia for more than 1 year [40]. The mean (95% CI) between-group differences were 1.6/100 pts (0.7–2.4) and 4.1/100 pts (1.1–7.1), respectively. The 95% CIs indicated the possibility of trivially small effects of 0.7/100 pts or larger effects of 7.1/100 pts, respectively. The other nine trials failed to find statistically significant treatment effects. Their estimates were imprecise as reflected by wide 95% CIs. As such, they failed to rule out the possibility of clinically important treatment effects.

One type of physiotherapy intervention versus another type of physiotherapy intervention

Sixteen trials compared one type of physiotherapy intervention with another type [6, 8,9,10, 12, 13, 17, 20,21,22, 25, 27, 30, 35, 38, 39] (See Figs. 3 and 4).

Five of these trials [8, 12, 22, 30, 35] compared robotic gait training with overground gait training and had similar designs, inclusion criteria and comparisons to consider pooling of data. However, they used combinations of SCIM and FIM necessitating the use of a SMD and hence the use of either post or change data (but not both). Four of the five trials provided post data and were pooled for analyses [8, 12, 22, 30] (one trial only provided change data [35]). The pooled SMD of the four trials was 0.38 SMD (95% CI, 0.08–0.67; p = 0.01, I2 = 22%) favouring robotic gait training (see Supplementary File 3, Fig. 1). The estimate of the fifth trial was precise and failed to demonstrate a treatment effect with a mean (95% CI) between-group difference of −0.5/7 pts (−1.1 to 0.1) on the FIM locomotor item [35].

Another five trials examined the effectiveness of different doses of robotic gait training [25], or compared either robotic gait training or body-weight supported treadmill training (with or without electrical stimulation) versus one of the following: strength training [9], tilt table standing [10], passive lower limb movements [27] or body-weight supported treadmill training with general exercise [38] (see Fig. 3). None of these five trials demonstrated a treatment effect.

Four trials compared various interventions that did not involve gait training. They included strength training versus endurance training [13], activity-based therapy versus upper limb training [17], sitting balance training with and without virtual reality [21], and short-sitting balance training versus long-sitting balance training [6] (see Fig. 4). Three of four trials demonstrated modest treatment effects [6, 13, 21]. Another two trials compared robotic upper limb training with conventional upper limb training [20, 39]. These two trials could not be pooled because one measured SCIM total scores [20] and the other SCIM self-care sub-scores [39]. Consequently, the results needed to be expressed as a SMD but could not be pooled because one provided post data [39] and the other provided change data [20]. The two trials provided conflicting results. One did not demonstrate a treatment effect [39] and the other demonstrated a small treatment effect with a mean (95% CI) between-group difference of 9.3/100 pts (2.0–16.7) [20].

Sensitivity analysis

Three trials measured both FIM and SCIM [14, 23, 24]. There were no statistically significant between-group differences on SCIM scores in any of the three trials but there were statistically significant between-group differences on FIM scores in two trials [23, 24] (see Supplementary File 3, Fig. 2). These two trials both examined upper limb training with and without functional electrical stimulation. They were similar and therefore pooled in a meta-analysis. The pooled SMD was 1.31 SMD (95% CI, 0.62–1.99; p = 0.0002, I2 = 0%) indicating a notable treatment effect (see Supplementary File 3, Fig. 2A).

Discussion

The motivation for this systematic review came from concerns that physiotherapy interventions could confound the causal relationship between exposure to a novel treatment approach (e.g. biological or pharmacological therapies and other emerging interventions) and SCIM/FIM in cohort studies and some analyses of clinical trials. The most notable finding was that robotic gait training compared with overground gait training increases SCIM/FIM scores with a SMD of 0.38 (95% CI, 0.08–0.67; see Supplementary file, Fig. 1). However, two of the four trials contributing to this meta-analysis had PEDro scores of 3/10 [22] and 4/10 [8] indicating high susceptibility to bias, and one trial [35] that failed to demonstrate a treatment effect was excluded from the meta-analysis. For these reasons, the results of the meta-analysis need to be interpreted with caution and summarised as initial weak evidence that robotic gait training increases SCIM/FIM scores. There was some limited evidence from six other trials to indicate that various physiotherapy interventions increase SCIM/FIM [6, 13, 20, 21, 29, 40]. Four of the six trials compared one intervention to another intervention [6, 13, 20, 21], and two compared one intervention to no intervention or a sham intervention [29, 40]. The types of interventions included robotic upper limb training [20], mat exercises with electrical stimulation [40], transfer training [29], strength training with arm cranking [13], and sitting balance training with and without virtual reality [6, 21]. A mix of SCIM and FIM were used in these trials. The size of the treatment effects was generally small (e.g. 9/100 points [20]), 4/100 points [40], 2/100 points [29], 8/126 points [13], 1.5/7 points [6], 1.7/20 points [21]) but most of the estimates were reasonably precise as reflected by the narrow 95% CIs (see Figs. 24). Two additional trials had contradictory results, demonstrating a statistically significant between-group difference on FIM (see Supplementary File 3, Fig. 2A) but not SCIM (see Fig. 2)[23, 24].

The PEDro scores suggested high susceptibility to bias in most trials. A common source of bias was the failure to blind participants and therapists: it is acknowledged that this is rarely possible in trials of physiotherapy. On the one hand, the treatment effects of the positive trials may have been smaller (or even disappeared) if the trials had been less vulnerable to bias. On the other hand, the effect of these interventions might have been more pronounced if they were compared to no treatment rather than to an alternate physiotherapy intervention. The middle ground between these two extremes suggest that some of these physiotherapy interventions may have a small effect on SCIM/FIM.

The included trials used different combinations of SCIM and FIM scores with some trials providing total scores and others only providing sub-scales or scores of individual items. We prioritised total scores even though these captured some SCIM and FIM items that are unlikely to be affected by physiotherapy interventions. We reasoned that the inclusion of items unaffected by physiotherapy should not hide a treatment effect on other items of the SCIM and FIM although it may diminish the size of the treatment effect. This approach is also justified given the aim of this systematic review was to guide future large studies designed to determine the effect of experimental interventions including pharmacological, biological, technological and other emerging interventions. These types of studies are most likely to use total FIM or SCIM scores, not sub-scores.

Needless to say, the failure of most trials to demonstrate a statistically significant between-group difference should not be interpreted as evidence that physiotherapy interventions do not increase SCIM/FIM, for a number of reasons. First, some of the negative trials were too small to rule out the possibility of clinically meaningful treatment effects. This is evident by the width of the 95% CI of the mean between-group differences. For example, Wirz et al. [25] failed to rule out the possibility of a 21/40 point increase on the SCIM mobility sub-score (as indicated by the higher bound of the 95% CI; see Fig. 3). Secondly, the failure of one treatment compared to another treatment to demonstrate a treatment effect does not tell us anything about a treatment compared to no treatment. The two treatments may be equally effective (or ineffective). Thirdly, the results may reflect the difficulty of trying to demonstrate a treatment effect from any one physiotherapy intervention alone. It may be that many different physiotherapy interventions need to be administered as part of a package of treatments to change SCIM/FIM. It is also likely that the effect of physiotherapy interventions on SCIM/FIM is strongly influenced by many other factors including time since injury, type of injury and treatment dosage. There was considerable variability in all these factors across the different studies (see Table 1). Lastly, two trials used a cross-over design [9, 10], which may not have been appropriate despite the authors’ claims that there was no evidence of a carry-over effect.

Of course, this systematic review may not provide a full picture of the available evidence. For example, the exclusion of papers not published in English may have biased our results. It is also possible that there are trials, which were conducted but never published but these are more likely to be negative trials than positive trials. Importantly, there are many more measures of function than merely SCIM/FIM. So the results of this systematic review only summarise the effect of physiotherapy on SCIM/FIM, not on function per se (that was not the purpose of this systematic review).

In all, this systematic review provides initial weak evidence to suggest that robotic gait training and a few other physiotherapy interventions increase SCIM/FIM scores. This limited evidence is surprising because anecdotal evidence strongly suggests that many different types of physiotherapy interventions increase SCIM/FIM. However, and importantly, the lack of evidence should not be interpreted as evidence of no effect. Until further studies are conducted it may be prudent to control for the possible confounding effects of all physiotherapy interventions in clinical trials and cohort studies. However, it should be clear that at this point in time this is an assumption that is not based on high-quality evidence.