Introduction

Stroke has increased the economic burden imposed on health care systems, accounting for 2% to 4% of total health care costs1 and is a major cause of death and permanent disability worldwide2. More than half of the patients who suffer from stroke become moderately to severely disabled3,4. Stroke is also the leading cause of death and long-term disability in China5,6, where the rising incidence of stroke has created a serious public health problem7,8. Post-stroke patients present with musculoskeletal, sensorimotor, perceptual and cognitive deficits. Interventions intended to reduce pain and spasticity, as well as to increase range of motion (ROM), muscle force, mobility, ambulation, functionality, physical fitness and quality of life, can be used for stroke rehabilitation9. Acupuncture, a component of traditional Chinese medicine, is an accepted complementary treatment in stroke rehabilitation both in Asian countries10,11 and the West12,13 due to its effects on spasticity, loss of function, loss of mobility, depression, aphasia, hemiplegia and pain reduction14. Therefore, the use of acupuncture has been advocated by reviews14 and Stroke Engine14 and has been included in guidelines9 pertaining to stroke rehabilitation.

Systematic reviews (SRs) of high-quality randomized controlled trials (RCTs) are considered the best evidence regarding specific healthcare interventions15,16. SRs, particularly high-quality reviews, provide clinicians with more reliable findings and enable conclusions to be drawn and decisions to be made regarding both patient care and health policy15,17,18. Many SRs of RCTs regarding acupuncture have been published with conflicting results, as some studies have described significant beneficial effects9,14, whereas others have failed to demonstrate any effects in the setting of stroke rehabilitation10,14,19,20,21,22,23. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach24 is designed to rate the quality of a body of evidence and can be applied to evaluate SRs and other forms of evidence, including health technology assessments and to determine the strengths of any relevant guidelines or recommendations24,25,26,27. The GRADE system clearly distinguishes between evidence quality and recommendation strength and takes into account other factors in addition to evidence to suggest appropriate therapeutic approaches25, resulting in a recommendation for or against an intervention that is based on whether the potential benefits of said intervention outweigh the potential harm caused or burden imposed by the intervention24, as well as on patient values and preferences.

Despite the availability of numerous publications regarding the use of acupuncture in stroke rehabilitation10,14,19,20,21,22,23, the evidence included in these SRs has not been evaluated systematically via the GRADE approach28,29. The aims of the current study were to review the quality of evidence in SRs of acupuncture in stroke rehabilitation and to rate the strength of recommendation for its use based on this evidence using the GRADE approach.

Methods

Criteria for Inclusion

PICOS approach

The Population, Intervention, Comparator, Outcome and Study design (PICOS) approach was used to frame our research objectives.

Study Design

SRs containing at least one RCT were included in this study.

Study Participants

Patients with either hemorrhagic or ischemic stroke, at any stage or severity (including patients with cerebral infarction, intracerebral hemorrhage, cerebral embolism, or unclassified stroke), who were either (1) diagnosed via brain computed tomography (CT) scan or brain magnetic resonance imaging (MRI) or (2) diagnosed clinically according to the World Health Organization definition (rapidly developing focal or global disturbances of cerebral function lasting more than 24 hours or resulting in death, with no other apparent cause of vascular origin23), regardless of age, sex or neurological deficit severity.

Intervention

Either traditional acupuncture, which entails the insertion of needles at classical meridian points, or contemporary acupuncture, which entails the insertion of needles at non-meridian points or trigger points, was utilized, regardless of the source of stimulation (hand stimulation, ear acupuncture, abdominal acupuncture, wrist-ankle needle, scalp acupuncture, fire needle, moxibustion with a warming needle, or electrical stimulation). SRs in which the acupuncture treatment did not involve needling, such as those using point injection, acupressure, laser acupuncture, tap-pricking or cupping on pricked superficial blood vessels, were excluded.

Comparison

The control interventions included sham acupuncture, placebo acupuncture or other conventional treatments (including Western medical treatments, traditional Chinese medical treatments except acupuncture, physical therapy, occupational therapy and speech therapy). Sham treatment included the following: (1) needle pricking on the skin surface (needles placed close to but not at the acupuncture points)30 or (2) subliminal skin electro-stimulation via electrodes attached to the skin. Placebo acupuncture referred to a needle being attached to but not penetrating the skin (the needle was applied to the same acupuncture points as in the treatment)30. SRs in which acupuncture treatment on an affected side was compared with that on an unaffected side were also considered for inclusion.

Outcome Measures

We convened a panel of twelve experts in neurology, neurosurgery, cerebrovascular disease, acupuncture and traditional Chinese medicine at our hospital. These experts collected as many outcomes associated with stroke rehabilitation as possible and subsequently rated each outcome numerically from 1 to 9 points based on clinical importance (1: of least importance; 9: of most importance) and privately recorded their judgments. Generally, outcomes of long-term patient interest, as well as acupuncture-associated outcomes (bent needle, stuck needle, broken needle, fainting, injury to important organs, infection, bleeding), were defined as important outcomes. The individual judgments were then aggregated statistically to derive the median score for each outcome. We subsequently classified the importance of each outcome. Three outcome categories were specified based on their importance regarding clinical decision making: critical (median score of 7 to 9), important but not critical (median score of 4 to 6) and limited importance (median score of 1 to 3)24 (Table 1). Critical and important outcomes were used for decision-making and were included in the evidence profile24.

Table 1 Rating scale for outcome ranking according to clinical importance.

Literature Sources and Search Strategy

We searched “pre-appraised” evidence resources (defined as resources that underwent a filtering process to include only high-quality studies; they are regularly updated so the evidence available via these resources is current31) to collect SRs from their inception until September 2014; the websites used are listed in the supplemental methods.

An extensive literature search was conducted using the following databases: Medline (Ovid, 1966-2014.9), Embase (Ovid, 1974-2014.9), the Cochrane Database of Systematic Reviews (Ovid, 1991-2014.9), Psych-info (-2014.9), ScienceDirect (-2014.9), the China National Knowledge Infrastructure/China Academic Journals Full-text Database (CNKI, 1994-2014.9), the Chinese Biomedical Literature Database (CBM, 1978-2014.9), the Chinese Scientific Journals Database (VIP, 1989-2014.9), the Traditional Chinese Medicine (TCM) Database (1949-2014.9) and the Wanfang Database (1998-2014.9). No language restriction was imposed during the study search. The search strategies used in Medline (Ovid) are listed in Appendix 1 and were appropriately adapted for the other databases and the Chinese electronic databases using Chinese terms. We manually searched through four Chinese journals relevant to acupuncture (Chinese Acupuncture and Moxibustion, the Journal of Clinical Acupuncture and Moxibustion, Acupuncture Research and the Shanghai Journal of Acupuncture and Moxibustion) for articles published between 1980 and November 2014. We checked the reference lists of all relevant SRs identified and their authors were contacted as a means of identifying additional relevant SRs.

SR Selection

Two authors (Zhang X and Liu XT) independently checked the titles and abstracts of the articles and retrieved the full texts of potentially eligible articles for further review based on the selection criteria outlined above. In cases of disagreement between the two authors, a third member of our research group (Kang DY) reviewed the information to determine whether it should be included or excluded.

Selecting SRs of High Methodological Quality

The methodological quality of each SR was independently assessed by two reviewers using “A measurement tool to assess the methodological quality of systematic reviews” (AMSTAR32) and the Oxman-Guyatt Overview Quality Assessment Questionnaire (OQAQ)33, which contain 11 and 9 assessment criteria, respectively. Quality scores for both AMSTAR and OQAQ were calculated in accordance with the principles used in previous studies34,35,36 as follows: one point was awarded for each answer of “Yes” and 0 points were awarded for all other cases. Total scores were obtained after the two tools were used. Additionally, Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)37 was utilized to assess report quality. SRs of high methodological quality were screened in accordance with the rating system of the “Canadian Agency for Drugs and Technologies in Health” (CADTH)38: we rated each SR as being of “high” (range 9–11), “moderate” (range 5–8), or “low” (range 0–4) quality based on the AMSTAR overall score. For the OQAQ, we rated studies with an overall score of ≥7 as high quality. All relevant articles were assessed by two authors (Zhang X and Liu XT) using the above two scales and the SRs of high methodological quality (based on either AMSTAR or OQAQ) were used for data collection. Frequent, ongoing discussions among all the authors regarding any queries occurred throughout the rating process. Agreement validation and reliability were described in our previous article39.

Data Collection

Using a standardized form, two authors (Zhang X and Liu XT) independently extracted data from the SRs, including participant characteristics (age, sex), intervention details, measured outcomes, number of included trials, sample sizes of each group, diagnostic criteria, TCM syndrome classification, study methodology, original RCT quality and disease duration and state. The intervention details included meridian points, stimulation sources, drugs, medication doses, therapeutic regimens and treatment durations. Additional data and methodological information were obtained from the original RCT reports. We sought clarification from SR and RCT authors if an SR did not include clear descriptions of either the information or the methodologies used. Disagreements were resolved via discussion and consensus with a third researcher (Kang DY).

The GRADE Approach

SR Evidence Quality

The GRADE approach24 was used to assess the quality of each available SR and the strength of a recommendation regarding the use of acupuncture in the setting of stroke. The results pertaining to the primary outcomes of the SRs that were initially considered high-quality evidence (an AMSTAR score ≥9 or an OQAQ score ≥7) were extracted, appraised critically and used to construct a body of evidence.

Two authors (Zhang X, a clinician with expertise in TCM and Kang DY, a methodologist with expertise in both research methods and SRs) were trained to use the GRADE tool, which was obtained from the 22nd Cochrane Colloquium (Hyderabad, India, from September 21st to 26th, 2014). For each SR, the authors independently utilized the GRADE tool to evaluate the evidence pertaining to key outcomes. The methodological criteria by which evidence was upgraded or downgraded were dependent on five primary domains (risk of bias, inconsistency, indirectness, precision and publication bias), as well as the overall quality of the evidence (high, moderate, low, or very low)24.

From SR Evidence to Recommendations

The second component of the GRADE approach entails determining the strength of a recommendation, i.e., the level of confidence that the desirable effects (benefits) of acupuncture outweighed the undesirable effects (harms) or vice versa24. The recommendations were assigned to one of two categories: strong recommendations and weak recommendations (Supplemental Table 1).

Evidence quality was considered along with three additional key factors—the best estimates of the magnitudes of the effects on both desirable and undesirable outcomes, the importance of the outcomes and the confidence in the magnitudes of the estimates of the effects of acupuncture on the important outcomes (the overall quality of the evidence for the outcomes)24—that were collectively used to determine the strength of each recommendation.

Statistical Analysis

The data were entered into EpiData 3.140 and exported to SPSS-21.0 (SPSS, Inc., Chicago, IL, USA) for analysis. GRADEpro41 was used to grade both the quality of the evidence and the strength of the recommendations. Descriptive statistics, such as rate and proportion, were used for dichotomous data and either means (standard deviations) or medians (ranges) were used for continuous data. We calculated the inter-rater reliability between the two reviewers for each GRADE domain and the overall quality of evidence using Kappa coefficients if the number of included SRs was sufficient. Two-tailed P values of 0.05 or lower were considered statistically significant.

Results

SR Selection

A total of 4750 SRs were initially screened. Eleven SRs were finally selected for quality assessment using PRISMA, AMSTAR and OQAQ. Three SRs with high-quality evidence (an AMSTAR score ≥9 or an OQAQ score ≥7) that met the inclusion criteria were identified (Table 2). The detailed literature search process and study exclusion criteria are included in Fig. 1.

Table 2 SR evidence included in the GRADE approach.
Figure 1
figure 1

Flowchart: study selection.

Characteristics of the Included SRs and the Original RCTs

The three included SRs assessed 19 RCTs. The following interventions were investigated within the original RCTs and varied in duration, frequency and intensity:

  • (1) Acupuncture (classical, electrical and sham)

  • (2) Rehabilitation therapy

  • (3) Physical therapy

  • (4) Occupational therapy

  • (5) Speech therapy

  • (6) Traditional Chinese medicine

  • (7) Use of aspirin

The outcome measures, intervention effects, risk of bias assessments and other characteristics of the included RCTs and SRs are presented in Tables 3 and 4.

Table 3 Quality assessment and summary of findings using the GRADE approach.
Table 4 From SR evidence to recommendations.

Quality Assessment of the Overall Body of SR Evidence using the GRADE Approach

Only quantitative analyses from the three included SRs (Supplemental Table 2) were used to determine the overall quality of the evidence supporting each specific recommendation via the GRADE approach. The evidence pertaining to the nine critical outcomes was downgraded to either low or moderate quality due to various limitations.

Study Limitations

Although the included SRs were of good quality, indicated by a median AMSTAR score of 9 and a median OQAQ score of 8, the original RCTs were of poor quality. Four of the original RCTs42,43,44,45 that used neurological function as a primary outcome failed to report sufficient information to enable conclusions regarding whether the random sequence generation, allocation concealment, blinding or outcome data were adequate. The risk of bias assessments of the 19 included RCTs are included in Supplemental Table 2. The evidence comparing acupuncture plus conventional stroke rehabilitation (CSR) with CSR alone was also downgraded due to blinding (Table 3). Inadequate reporting increases the possibility of bias and decreases the validity of the GRADE approach.

Inconsistencies in the Results

Regarding the global neurological deficit outcome for acupuncture plus conventional care vs. conventional care alone, inconsistencies were noted among the 4 studies42,43,44,45 in the results pertaining to different control interventions and combined interventions; there were also inconsistencies among the RCTs regarding the differences reported between the results for patients with ischemic stroke and those for patients with ischemic/hemorrhagic stroke. Statistical inconsistencies were observed among the 4 studies in the meta-analysis results, with an I2 = 63% (P = 0.04). As a result, we considered the level of inconsistency to be serious. Inconsistencies regarding the other outcomes were primarily attributed to differences in interventions, acupuncture details, stroke stages and methodological quality among the studies (downgraded by one level) (Table 3).

Indirectness of the Evidence

The SR46 reporting global neurological deficit as a critical outcome included only patients with a stroke duration ≥1 month; consequently, the results from this study may only be applicable to patients recovering from stroke (Supplemental Table 2); however, we determined that the indirectness was not serious. Similarly, acupuncture helped improve swallowing during the sub-acute stage (stroke onset within 2 to 28 days) and improved both motor recovery and disability among patients who suffered either moderate or severe stroke (Supplemental Table 2).

Precision

Because the total number of participants was small (<400)24, the 95% CI overlapped with no effect (i.e., an OR or RR of 1.0) and failed to exclude important benefits (an OR or RR increase of 25% or more)24, we subsequently downgraded the quality of the evidence of the six outcomes by one level based on imprecision (Table 3).

Publication Bias

Publication bias could not be ruled out because of the limited number of trials pertaining to the included outcomes (Table 3).

Recommendations

The quality of SR evidence was assessed using five components to grade the recommendations while taking into account the patients’ characteristics24 (Table 4). The small relative effects (RR < 2)24 of acupuncture on both desirable and undesirable outcomes were more likely to warrant a weak recommendation. Furthermore, the low overall quality of evidence regarding critical outcomes and the low confidence in the effects on other outcomes were considered critical (often causing long-term harm) and resulted in weak recommendations. The strength of the recommendations regarding the use of acupuncture in patients suffering from stroke is described in detail in Table 4.

Discussion

Three high-quality SRs were identified, in which seventeen RCTs reporting nine critical outcomes with quantitative analyses were used to determine the overall quality of the evidence supporting the GRADE recommendation that acupuncture yields benefits in stroke rehabilitation (neurological function improvement, swallowing improvement and disability). Virtually none of the authors of the included SRs were aware of negative effects caused by acupuncture, which can occur even when it is performed by a well-trained, licensed acupuncturist. The weak strength of the recommendation to use acupuncture in stroke rehabilitation implies that the decision to prescribe acupuncture for a patient suffering from stroke symptoms and sequelae should be approached with caution. In this study, we did not rate the recommendations of the domains of values and preferences or resource use, as reliable data were unavailable.

Rating an overall body of evidence using the GRADE approach is becoming an important and recommended step in evidence synthesis initiatives24 and may improve the transparency of shared decision-making processes, particularly under conditions in which the quality of evidence is either low or unclear. When evaluating the available evidence, to perform a final quality assessment, the GRADE approach includes detailed scrutiny of the potential limitations within a whole body of evidence, considering factors such as risk of bias, result inconsistency, indirectness and imprecision (Table 3). Rating the quality of a body of evidence is valuable for end users (patients, clinicians and policy makers) of evidence syntheses, as it serves as an indicator of the confidence they should have in the results. Additionally, producing this rating is a key step in translating a particular body of evidence into clinical practice24. Moreover, GRADE clearly differentiates evidence quality from recommendation strength. Recommendation evaluation occurs during the second step of the decision-making process, during which evidence quality is considered in light of other factors to enable both a correct and transparent judgment of recommendation strength. This approach is realistic because in a clinical setting, decisions regarding therapy cannot be made solely on the basis of evidence quality. To the best of our knowledge, this is the first study to apply the GRADE approach to evaluate SRs regarding the use of acupuncture in stroke rehabilitation.

Several organizations have summarized the evidence regarding the use of acupuncture in stroke rehabilitation; however, conflicting conclusions have been drawn. The Ottawa Panel9 developed guidelines, published in 2006, regarding the use of acupuncture in the management of adult patients suffering from stroke. The guidelines recommend acupuncture as an adjunct treatment to improve specific outcomes during both the acute and subacute stages of stroke rehabilitation. However, based on the results of several SRs, Stroke Engine14 concluded that acupuncture is not an effective treatment in the setting of stroke rehabilitation, as there is insufficient evidence to draw such a conclusion. Furthermore, the Evidence-based Review of Stroke Rehabilitation (EBRSR)47 summarized several SRs published over the past 15 years with conflicting results: some SRs determined that acupuncture provides beneficial effects in the rehabilitation of stroke patients, whereas others did not. In the EBRSR review, neither the methodological quality of each SR nor the quality of the evidence for each outcome across all SRs (i.e., the body of evidence for an outcome) were rated and conflicting results were not explained; thus, this review may confuse clinicians who must make decisions regarding patients. In general, to help clinicians make evidence-based medical decisions, review authors should rate both the quality of available evidence and include informed recommendations. These recommendations entail balancing the desirable and undesirable consequences of a given treatment and may help both clinicians and policy makers make better decisions regarding patient care. Future research should therefore focus more on developing a system with which to synthesize SR evidence into a greater body of evidence as opposed to providing a brief summary of results.

We encountered several challenges using GRADE. During the first round of risk bias assessment, we struggled in determining how to integrate the quality assessments of the original RCTs into a whole body of evidence. Additionally, poor reporting on “sequence generation”, “incomplete outcome data”, “allocation concealment” and “selective reporting” downgraded the risk of bias15; in instances such as these, the overall body of evidence should be interpreted cautiously. Although the GRADE Handbook suggests that, in principle, evaluating the extent to which each RCT contributes toward estimating the magnitude of an effect (usually reflecting both study sample size and number of outcome events) is warranted24, when assessing our overall body of evidence with respect to discrepancies in the risk of bias assessments among the RCTs, the utilization of said contributions was both challenging and impractical. Moreover, shifting away from overall quality scores or summaries toward a component approach for the individual studies15 was found to be incongruous with forcing an overall assessment of the risk of bias for a group of RCTs. Additionally, our study identified more than 10 outcomes within the 19 included RCTs, which made it difficult to compare inconsistencies without a meta-analysis, as there was no I2 statistic with which to determine heterogeneity. Although a qualitative approach may be utilized to assess consistency across multiple studies (considering whether estimates were similar in terms of the magnitude and direction of an effect as well as its statistical significance), such an approach is both subjective and unreliable. Similarly, imprecision assessments across studies are particularly problematic in the absence of a meta-analysis. We included only outcomes with quantitative analyses. In addition to confidence intervals (CIs) and the lines of no effect, another criterion, the optimum information size (OIS), should be used to ensure adequate precision. OIS is defined by the number of patients generated by a conventional sample size calculation for a single trial. If the total number of patients included in an SR is less than the OIS, the quality of the SR should be downgraded due to imprecision24. The greater the difference between the number of subjects included in a potentially adequate RCT and the number of subjects included in an SR, the greater the probability of downgrading the SR due to imprecision. In this study, because of insufficient reporting regarding sample size calculation and a lack of data regarding Δ in the original RCTs24, the minimum OIS number was not obtained. The small sample sizes (n < 400) of the three critical outcomes may therefore support the decision to downgrade the quality of the evidence by one level24.

There are several limitations to this study. A highly sensitive search strategy was utilized to identify current SRs; however, only three SRs were included for grading. This low number may have been due to methodological quality. The purpose of utilizing GRADE was to obtain both valid and reliable information to guide evidence-based decisions. The production of both comprehensive and accessible pre-appraised resources supports the use of an evidence-based approach to decision making. If SRs are to be useful, serious consideration must be given to how they are conducted. When the SR evidence gathered is strong and its implications are clear, the generated body of evidence should subsequently influence decision-making and shape health policy. The ultimate test of an SR is whether conducting it produced confidence that it is evidence-based and that it accurately reflects a process during its various stages. Evaluating the validity of SRs should influence the analysis, interpretation and conclusions of the GRADE approach. In particular, an SR of invalid studies may produce a misleading GRADE result, yielding a narrow confidence interval (good precision) around the wrong intervention effect estimate. Therefore, only high-quality SRs were selected for our study. Furthermore, our results may have been biased, as our selection process missed primary articles included in lower-quality SRs (which were excluded during the selection process). The objective of this research was to synthesize evidence from SRs; therefore, we did not consider these potentially missing primary articles. Future research may synthesize the evidence from primary articles, such as RCTs and cohort studies, using the GRADE approach; the results of these two research studies could then be compared. Moreover, almost none of the authors of the included SRs were aware of the negative effects caused by acupuncture—even when it is performed by a well-trained, licensed acupuncturist—other than occasional bruising. Finally, we did not perform sensitivity analyses to explore the differences among the results.

Additionally, although two reviewers independently utilized the GRADE tool to rate the SRs, an agreement assessment was not performed because of the small number of SRs included in this analysis. It is important to emphasize that some subjectivity exists when assessing both the quality of evidence and the strength of a recommendation. However, our decision process was transparent and frequent discussions took place among all authors regarding any queries. Finally, we did not assess publication bias, as we did not have a sufficient number of studies with which to formally evaluate it. Thus, publication bias may have existed in our findings, although a sensitive search strategy was utilized.

Conclusion

In summary, we systematically assessed current evidence from SRs regarding the use of acupuncture in patients suffering from stroke. Moreover, an innovative approach was utilized to assess the quality of the SR evidence and the strength of recommendations pertaining to specific clinical procedures. Recent SRs that have evaluated RCTs describe the potential benefits of using acupuncture in stroke patients to improve rehabilitation; however, the overall body of evidence was found to be of low quality. Our critical appraisal of the evidence using the GRADE approach resulted in the formulation of a weak recommendation regarding the use of acupuncture in the setting of stroke rehabilitation. High-quality, well-designed SRs and RCTs are warranted to support the utilization of acupuncture in the setting of stroke rehabilitation.

Additional Information

How to cite this article: Xin, Z. et al. GRADE in Systematic Reviews of Acupuncture for Stroke Rehabilitation: Recommendations based on High-Quality Evidence. Sci. Rep. 5, 16582; doi: 10.1038/srep16582 (2015).