Introduction

The CONSORT (Consolidated Standards of Reporting Trials) guidelines are internationally recognised guidelines for the reporting of randomised controlled trials and have now been available for nearly 20 years.1 The primary purpose of the CONSORT guidelines is to improve the reporting of randomised controlled trials and to increase transparency. Transparency is important for determining the scientific rigour of a trial, interpreting the results of a trial and assessing a trial’s susceptibility to bias. Inadequate reporting of randomised controlled trials is associated with biased and often inflated treatment effects.2

The first version of the CONSORT guidelines was published in 19961 and then updated in 2001, and revised again in 2010.2 There are now also CONSORT guidelines specifically for non-pharmaceutical trials, pragmatic trials, and non-inferiority and equivalence trials. The 2010 CONSORT guidelines have 37 items covering a range of methodological and reporting issues. Some items are specifically devoted to the analyses, reporting and interpretation of results. For example, item 17a includes recommendations about the reporting of effect sizes and the precision of treatment estimates. Other items of the CONSORT guidelines, such as items 6b, 23 and 24, cover issues important for guarding against selective reporting of results. Item 6b (any changes to trial outcomes after the trial commenced, with reasons) recommends reporting of any changes to trial outcomes from those documented on the trial protocol. Items 23 and 24 require authors to provide details of the trial registration and where the trial protocol can be accessed, respectively.

Nearly 400 journals have endorsed the CONSORT guidelines and recently 28 rehabilitation journals announced that they would ensure trials published in their journals adhered to the CONSORT guidelines (and other guidelines for the reporting of research) by January 2015.3 We were interested in this initiative and in particular as it relates to randomised controlled trials about physical interventions for people with spinal cord injuries (SCI). Other researchers have examined the reporting of randomised controlled trials for other interventions and conditions,4, 5, 6, 7, 8 but have not looked at physical interventions for people with SCI. Therefore, the primary aim of this study was to determine how well randomised controlled trials of physical interventions for people with SCI adhere to the CONSORT guidelines. The secondary aims were to determine whether the reporting of randomised controlled trials have improved over the past 10 years and whether it is realistic to expect adherence to the CONSORT guidelines by January 2015.

Methods

Search strategy

The following databases were searched for publications between January 2003 and December 2013: Medline, CINAHL, Embase, the Cochrane Central Register of Controlled Trials and the Physiotherapy Evidence Database (PEDro). A search strategy for randomised controlled trials6 was used along with the following terms: parapleg$, quadripl$, tetrapleg$, wheelchair$ and spinal cord. This search strategy was adjusted for each database.

Inclusion criteria

The inclusion criteria were as follows:

Type of trials: Randomised controlled trials written in English. Cross-over trials were included provided allocation to the treatment schedule was randomised.

Type of participants: Trials in which at least 75% of participants had sustained an SCI and were adults. There were no restrictions on the basis of time since injury or type of injury, but trials involving predominantly children were excluded.

Type of interventions: Trials involving the administration of a physical intervention typically provided by a physiotherapist. Only trials that involved a treatment administered over more than one occasion were included. Trials that examined the effectiveness of education, support programs, equipment and strategies for the management of respiratory or skin problems were excluded.

Type of comparisons: Trials involving any type of comparison provided at least one group received a physical intervention.

Types of outcomes: Trials involving any physical or non-physical outcome measures.

If trials were published more than once or interim analyses were published before the completion of the trial, then the most recent publication was retrieved.

Data collection

Two reviewers screened publications by title and abstracts. Full copies of potentially eligible trials were retrieved and again screened for eligibility. Any disagreements between the two reviewers were resolved by a third independent reviewer.

Two of four reviewers independently assessed the reporting of each trial according to the 2010 CONSORT checklist (see Figure 1) using the detailed explanation of the CONSORT guidelines.2 One of the four reviewers rated all trials. A third and sometimes fourth reviewer arbitrated any disagreements between the ratings of the original two reviewers. Each of the 37 items on the CONSORT checklist was scored as ‘fully reported’, ‘partially reported’, ‘not reported’, ‘not relevant’ or ‘not reported but unable to determine if relevant/done’ according to the following criteria:

Figure 1
figure 1

The percentage of trials that ‘fully reported’, ‘partially reported’ or ‘not reported’ each of the items on the CONSORT checklist. If it was not clear if an item was relevant and this could not be determined by a trial registry or protocol, then the item was rated as ‘not reported but unable to determine if relevant/done’. Some items were ‘not relevant’.

Fully reported: This rating was used if all aspects of an item were described. Importantly, the ratings were not based on whether recommendations for good randomised controlled trial design had been followed, but rather based on whether it was clear what had been done. For example, item 11a (who was blinded and how (participant, care provider, assessors, data analysts)) was rated as ‘fully reported’ even if researchers, participants, health-care providers and data analysts were not blinded provided this was clearly stated.

Partially reported: This rating was used when some but not all aspects of an item were addressed. For example, item 6a requires authors to describe primary and secondary outcomes as well as provide details about how and when these outcomes were assessed. If authors described primary and secondary outcomes but did not provide sufficient details about how and when they were assessed then the ‘partially reported’ rating was used.

Not reported: This rating was used if an item was not reported and should have been reported (i.e. the item was relevant). For example, item 6b requires authors to specify whether any outcomes had changed since the commencement of the trial. If authors did not make any comment about changes in outcomes then this was cross-checked against trial registries or publically available protocols where possible. If the outcomes reported in the trial registries did not match the outcomes reported in the trial then it was assumed that there had been changes to the outcomes which were ‘not reported’.

Not reported but unable to determine if relevant/done: This rating was used when it was not clear whether the failure to include details of an item reflected an important omission or reflected that the item was not relevant. For example, item 6b was rated as ‘not reported but unable to determine if relevant/done’ if there was no comment about changes in outcome measures but the trial was not registered and there was no way to confirm whether the outcomes had changed. The ‘not reported but unable to determine if relevant/done’ rating was used as an option for item 3b (important changes to methods after trial commencement (such as the eligibility criteria), with reasons), item 6b (any changes to trial outcomes after the trial commenced, with reasons), item 7b (when applicable, explanation of any interim analyses and stopping guidelines), item 12b (methods for additional analyses), item 14b (why the trial ended or was stopped), item 18 (results of other analyses distinguishing pre-specified from exploratory), item 19 (all important harms or unintended effects in each group) and item 25 (sources of funding and other support (such as supply of drugs), role of funders).

Not relevant: This rating was only used for items that were clearly irrelevant, namely 11b (description of the similarity of interventions), 14b (why the trial ended or was stopped) and item 17b (for binary outcomes, presentation of both absolute and relative effect is recommended). For example, item 11b requires authors to provide details about sham or placebo interventions to enable readers to judge the effectiveness of participant and/or health-care provider blinding. However, if trials did not utilise sham or placebo interventions, then this item was not relevant. Similarly, item 17b was only relevant if binary outcomes were used.

Analysis

The number of CONSORT items that was ‘fully reported’ in each trial was first tallied to derive a total score for each trial ranging from 0 to 37. This analysis was repeated for the number of CONSORT items that were ‘fully reported’ or ‘not relevant’ or ‘not reported but unable to determine if relevant/done’. This second tally was preformed to provide a best-case analysis. Descriptive analyses were then used to determine the median (IQR) number per trial of CONSORT items that was ‘fully reported’ and the median (IQR) number per trial of CONSORT items that was ‘fully reported’ or ‘not relevant’ or ‘not reported but unable to determine if relevant/done’. The number of ratings for each item of the CONSORT was also tallied.

A visual analysis of data was used to determine whether the quality of reporting of randomised controlled trials has improved over the past 10 years by examining the relationship between the median number of CONSORT items that was ‘fully reported’ per trial and year of publication. A similar analysis was used to explore whether changes in the quality of reporting of randomised controlled trials may be partly explained by changes in the number of publications per year. This second visual analysis examined the relationship between the number of randomised controlled trials published per year and year of publication.

Results

In all, 11 883 trials were retrieved from the search strategy. Of these, 79 were identified as potentially relevant, but only 53 met the inclusion criteria.9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 The other 26 were excluded because either they were reports of interim analyses, protocols, duplicates or secondary publications, or not the type of intervention a physiotherapist would typically administer (e.g. transcranial magnetic stimulation).

The details of the interventions and the results and other aspects of the trial were not the focus of this paper, but in brief the trials looked at a range of physical interventions including different types of gait training, various forms of electrical stimulation, exercise therapy, fitness training, upper limb training and more. The number of participants in the trials ranged from 5 to 146. Participants had a mix of acute and chronic injuries with all patterns of neurologic loss. The journals in which the trials were published are detailed in Table 1. Most were published in Archives of Physical Medicine (n=11), Spinal Cord (n=8) and Journal of Physiotherapy (previously Australian Journal of Physiotherapy) (n=5). The majority of trials concluded that the experimental intervention was effective, although this conclusion was not always supported by the results.

Table 1 The journals that published two or more of the 53 retrieved randomised controlled trials

Table 2 and Figure 1 show the reporting of each CONSORT item. The median (IQR) number of CONSORT items per trial that was ‘fully reported’ was 11/37 (7–20). The median (IQR) number of CONSORT items per trial that was either ‘fully reported’ or ‘not relevant’ or ‘not reported but unable to determine if relevant/done’ was 20/37 items (17–27).

Table 2 The number of trials that ‘fully reported’, ‘partially reported’ or ‘not reported’ each of the items on the CONSORT checklista

The items of the CONSORT checklist that were ‘fully reported’ in the most number of trials were item 2a (scientific background and explanation of rationale, n=52), item 2b (specific objectives or hypotheses, n=38), item 4a (eligibility criteria for participants, n=40), item 5 (details about the intervention for each group including how and when they were administered, n=50) and item 12a (statistical methods used to compare groups for primary and secondary outcomes, n=45). However, of these, the only item that was ‘fully reported’ in all but one trial was item 2a (scientific background and explanation of rationale). The items that were never ‘fully reported’ were item 1b (structured summary of trial design, methods, results and conclusions), item 10 (who generated the random allocation sequence, enrolled participants and assigned participants), item 11a (who was blinded and how (participants, care providers, assessors, data analysts)) and item 19 (all important harms or unintended effects in each group).

There was no obvious relationship between the number of trials published per year and year of publication (see Figure 2). Similarly, there was also no obvious relationship between the mean number of CONSORT items that were ‘fully reported’ in trials compared with the year of publication (see Figure 3).

Figure 2
figure 2

The number of trials published each year between 2003 and 2013.

Figure 3
figure 3

Graphical representation of the quality of reporting of trials between 2003 and 2013. Each dot represents a trial, its year of publication and the number of CONSORT items that it ‘fully reported’. Some trials are not captured in the figure because they did not ‘fully report’ any CONSORT items.

Discussion

The results of this study indicate poor adherence to the CONSORT guidelines in trials examining the effectiveness of physical interventions for people with SCI. This is of concern because transparent reporting of randomised controlled trials is important for determining the scientific rigour of a trial, interpreting the results of a trial and assessing a trial’s susceptibility to bias. It also raises questions about how realistic it is to expect trials to adhere to the CONSORT guidelines by January 2015 as recently recommended by 28 leading rehabilitation journals3 many of which publish trials in this area.

Surprisingly, less than half the trials ‘fully reported’ the CONSORT items dealing with the randomisation procedures (items 8a, 8b, 9 and 10; see Table 2). For example, details were rarely provided about the type of randomisation used (e.g. blocked, simple, stratified) and no trial clarified all three issues related to item 10 (namely, who generated the random allocation sequence, enrolled participants and assigned participants). Similarly, only 18 trials ‘fully reported’ item 6a even though this item covers an important aspect of randomised controlled trials, namely the reporting of primary and secondary outcomes. The most common omission was the failure to distinguish between primary and secondary outcomes. Sometimes as many as 25 outcomes were reported and at many different end points without clarification about which outcome and end point were the focus of the trial. Alternatively, the reporting of outcomes were split across multiple publications making it difficult to know how many outcomes were included in a single trial and which outcome was the primary one. For example, one trial reported one primary and one secondary outcome, although it was clear in subsequent publications that there were considerably more outcomes than initially declared. In addition, often it was difficult to know whether primary and secondary outcomes were pre-specified or selected after data analysis. These shortcomings are not dissimilar to trials in some areas of medicine. For example, a study in 2010 found that only 53% of trials specified a primary outcome.63 The failure to describe outcomes adequately and prespecify a primary outcome is a concern because of the potential for bias.

Item 15 (a table showing baseline demographic and clinical characteristics of each group) was ‘fully reported’ in 33 trials. These reasonably good results disguise an important and ongoing problem in this area, namely, the use of statistical tests to demonstrate that groups are comparable at baseline. As early as 1985 one of the world’s leading randomised controlled trialists stated that ‘performing a significance test to compare baseline variables is to assess the probability of something having occurred by chance when we know that it did occur by chance. Such a procedure is clearly absurd’ (p 126),64 and then in 1994 another leading statistician stated ‘the practice can accord neither with the logic of significance tests nor with that of hypothesis tests’ (p 1716).65 Numerous papers in statistical and epidemiologic journals66, 67 continue to advise against performing statistical comparisons of groups at baseline but the practice continues in most rehabilitation journals. It is perhaps now timely for rehabilitation journals to ensure that authors do not perform statistical tests on baseline data.

Item 22 (interpretation consistent with results, considering benefits, harms and other relevant evidence) was only ‘fully reported’ in 22 trials, yet this item is perhaps the one most readers rely upon when trying to implement the evidence. This item requires authors to provide interpretation of their data, which is consistent with their results and to consider other relevant evidence without ‘being limited to studies that support the results of the current trial’ (p 22).2 Yet, sometimes statistically significant within-group changes in one group but not the other were interpreted as evidence of treatment effectiveness or pre- to post statistically significant improvements in both groups were provided as evidence that both groups were equally effective. Equally problematic were authors’ claims of treatment equivalence on the basis of no statistically significant between-group differences. Some authors implied that the trial was designed to determine treatment equivalence, but it was not clear if this was an a priori hypothesis or a hypothesis formulated on the basis of the results. Yet, this distinction has important implications on the design of trials and interpretation of results.68, 69, 70 It was also common for the interpretation of results to be heavily weighted to results of subgroup analyses, secondary outcomes or the results of subitems of outcomes. For example, in one study the FIM was specified as a primary outcome but the only positive finding was the self-care subscores of the FIM. This was given undue emphasis in the interpretation of the results. Of course, interpretation of results will always be subjective and readers may disagree with some of our interpretation of trial results just as we disagree with the interpretation of others; however, our findings highlight some of the common misuses of statistics that are not unique to this area of research. It will be difficult for journals to turn this around in a short time frame to ensure the reporting of randomised controlled trials adhere to the CONSORT guidelines.

Item 23 deals with trial registration. This item was only ‘fully reported’ in 10 trials, although another four trials appear to have been registered retrospectively after publication. Likewise, only 2 of the 53 trials provided details about where the full trial protocol could be viewed (item 24). Trial registration and protocols are important for limiting undeclared changes to the analyses, primary outcomes and minimally worthwhile treatment effects; all sources of bias. Trial registration was mandated by the International Committee of Medical Journal Editors in 2004 and the WHO states that ‘the registration of all interventional trials is a scientific, ethical and moral responsibility’.71 Clearly, more work needs to be undertaken to make authors and journals aware of the need to register trials prospectively, and to encourage editors and reviewers to check trial details against those provided in registries.

There have been some clear improvements in the reporting of specific items of the CONSORT guidelines over recent years. For example, items 13a (number of participants for each group randomly assigned, treated and analysed for primary outcome) and 13b (for each group, losses and exclusions after randomisation, together with reasons) were more likely to be ‘fully reported’ in recent years even though they were only ‘fully reported’ in 15 and 32 trials, respectively. The better reporting of these items perhaps reflects the more widespread inclusion of the CONSORT flow diagram, which captures some aspects of these items.

Recently, a consortium of 28 rehabilitation journals agreed that ‘by January 1, 2015 [they]…. will have worked through implementation and the mandatory use of guidelines and checklists….’ (p 415)3 including the CONSORT guidelines. This is in line with the recommendations of The International Committee of Medical Journal Editors back in 2004.72 This initiative is to be applauded; however, the results of our study suggest that authors of randomised controlled trials in SCI are unlikely to comply by this deadline. There are two main reasons for believing this. First, a number of the CONSORT items need addressing at the design stage of trials. For example, primary outcomes need to be specified in protocols (item 6a) and trials need to be registered before commencement (item 23). This requires a 2- to 5-year lead-in time. Second, compliance with some of the CONSORT items will require a change in authors’ approach to statistical analyses and this may meet resistance. For example, item 17a requires authors to report between-group point estimates with measures of variability for each continuous outcome (e.g. 95% confidence interval). This item was only ‘fully reported’ in 11 trials. Those trials that did not fully report this item often solely reported P-values or provided point estimates with measure of variability for each group before and after an intervention and sometimes the accompanying point estimate of the change within each group but not point estimates of the between-group differences. In one trial, pre- to post changes within each group were expressed as Cohen’s D and were called ‘effect sizes’. These types of issues will be difficult to address and turnaround in a short time frame because they reflect an attitude and approach to statistical analysis. However, if authors can be persuaded to report results as recommended by items 17a and 17b of the CONSORT guidelines, then this alone will make a substantial difference to the transparency and interpretation of trials, and will make it considerably easier to summarise results in future systematic reviews and clinical practice guidelines.

There are many practical challenges for journals if they plan to gatekeep adherence to the CONSORT guidelines because the guidelines are complex and cover an extensive number of different aspects of trial design and reporting. For example, item 1b about the abstract covers so many different features that it now has its own separate guidelines to explain this one item.73 It will also be difficult for journals to know whether omission of some details relevant to particular items of the CONSORT guidelines reflects oversight, which needs addressing or reflects that the item is irrelevant. For example, it is difficult to know whether trials are stopped early unless authors specifically clarify that they were not (item 14b). However, there are some items of the CONSORT guidelines that journals could easily enforce. For example, it would be fairly simple for journals to insist that the title of trials includes the words ‘randomised trial’ (item 1a), yet only 16 trials did. Similarly, journals could readily insist that trials satisfy item 3a (description of trial design (such as parallel, factorial) including allocation ratio), item 4b (settings and locations where data were collected) and item 14a (dates defining the period of recruitment and follow-up).

Often it is argued that word limitations prevent authors from fully explaining the details of trials and adhering to the CONSORT guidelines.74 However, previous audits of randomised controlled trials have found that reporting is no better in journals without word limitations. If word limitations are the source of inadequate reporting, then journals could either link to trial protocols or provide a supplementary online repository for study details.7, 72

There are four main limitations of this study. First, the results of this study may not be an accurate representation of the real situation because we may have failed to identify all trials meeting our inclusion criterion. However, we were more likely to find trials from higher impact journals than lower impact journals, and the quality of reporting in trials is likely to be better in higher, than lower, impact journals. So if we missed trials, we have probably overestimated rather than underestimated compliance with the CONSORT guidelines. The second limitation of our study is that we rated some of our own trials. We may have been tempted to overstate compliance with the CONSORT guidelines of our own trials, although we had nothing to gain from this because we have not indicated which of the 53 trials did and did not fully report each item of the CONSORT guidelines. The third limitation of our study is that our ratings may not be accurate. Sometimes it was difficult to know whether authors had fully reported items of the CONSORT guidelines. For example, item 12a requires authors to report the statistical methods used to compare groups. However, it was not always clear whether the statistics described in the methods section of trials referred to within- or between-group comparisons. For example, in at least two trials, authors described using paired t-tests, but it was not apparent until the results that these were within-group comparisons (i.e. testing for differences between pre- and post data within a group). In another trial, it was difficult to interpret the statistical section, but on examination of the results it appeared that the authors combined results of both groups in the trial and compared these to the results of a different study, and did not conduct any between-group comparisons raising obvious questions about the suitability of a randomised controlled trial design. The fourth limitation is that we used the 2010 version of the CONSORT guidelines to rate trials and some trials were published before 2010. However, the changes made in 2010 were not substantial.75 Most related to creating subitems from items, and changing wording to increase consistency in style and to simplify and clarify the text. The only additional items were: item 3a (description of trial design (such as parallel, factorial) including allocation ratio), item 6b (any changes to trial outcomes after the trial commenced, with reasons), item 14b (why the trial ended or was stopped), item 23 (registration number and name of trial registry), item 24 (where the full trial protocol can be accessed, if available)) and item 25 (sources of funding and other support (such as supply of drugs), role of funders). Arguably these additional items reflect details of trials which authors could have been expected to include irrespective of whether they were part of the CONSORT guidelines at the time. Regardless, the results of this study point to poor transparency in the reporting of clinical trials.

In all, the CONSORT guidelines have been available and publicised for over 20 years, yet they are still not fully adhered to. The push by rehabilitation journals to ensure trials adhere to the CONSORT guidelines is to be applauded; however, it is unlikely that this is going to be achieved by the self-imposed deadline of January 2015. Nonetheless, these efforts to increase transparent reporting of randomised controlled trials are an essential step towards progressing evidence-based care for people with SCI.

DATA ARCHIVING

There were no data to deposit.