Introduction

Stroke is a common cerebrovascular disease caused by blockage or rupture of the blood vessels responsible for supplying oxygen and nutrients to brain cells. There is growing public interest in this disorder, which is a major global cause of disability and mortality1. Among several complications associated with stroke, post-stroke depression (PSD) is one of the most common neuropsychiatric disorders2,3, affecting 30–35% of stroke patients4,5,6,7,8.

In the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), PSD is classified as “depressive disorder due to another medical condition9”, with five or more major depressive symptoms occurring for 2 weeks or more after the stroke10. Several studies have shown that depression is associated with physical disability/recovery and mortality11,12,13. Treatment of PSD is important not only for management of depressive symptoms, but also for improvement of stroke-related treatment outcomes such as the effectiveness of physical and cognitive rehabilitation, and survival rates2.

The treatment of PSD is not markedly different from that of depressive disorder and pharmacological treatments using antidepressants, as well as psychotherapy, are frequently recommended14,15. Several systematic reviews have supported the efficacy of antidepressants for reducing depressive symptoms16,17; but they may not improve the activities of daily living (ADL) of PSD patients, and are more frequently associated with adverse events (AEs) than are placebo treatments17,18,19. Commonly identified AEs, particularly among elderly patients, include an increased risk of falls, hyponatremia, mortality, attempted suicide, and fracture20,21. Importantly, the use of antidepressants is also associated with an increased risk of stroke recurrence22.

Therefore, the development of a safe and effective alternative treatment for PSD may complement the existing antidepressant-centered strategy, particularly for patients with contraindications to antidepressants. Many aspects of PSD may respond to herbal medicine (HM), one of the modalities of complementary and alternative medicine, because of its multi-compound multi-target nature that potentially benefits neurological function, rehabilitation outcome, quality of life, and depressive symptoms23. Sihogayonggolmoryeo-tang (SGYMT, also known as Saiko-ka-ryukotsu-borei-to), is a HM consisting of 11 herbs. It was first introduced in the classical Chinese text “Treatise on Cold Damage Diseases” in the 3rd century. This prescription has since been recommended for several neuropsychiatric conditions including psychological anxiety, neurosis, and irritability24. In a recent meta-analysis of 8 randomized controlled trials (RCTs), SGYMT used as a monotherapy or adjunctive therapy to antidepressants was more effective for treating depression than antidepressants alone25. Moreover, experimental studies reported that SGYMT had an antidepressant effect by prevention of prefrontal cortex dysfunction26, and was as effective as the tricyclic antidepressant (TCA) imipramine27. In addition, an RCT reported that 3 months of SGYMT administration modulated dyslipidemia, a risk factor for ischemic stroke, suggesting a positive indirect effect on stroke-related outcomes28. Importantly, the use of SGYMT for treating PSD was recommended in a recent traditional Korean medicine (TKM) clinical practice guideline (CPG) in Korea24.

However, the use of SGYMT, which may complement the limitations of conventional therapies for PSD, has not yet been systematically and critically reviewed. The objective of this review is to analyze the effectiveness and safety of SGYMT as monotherapy or adjunctive therapy in patients with PSD using a systematic review methodology to help clinicians establish evidence-based treatment strategies for this disorder.

Materials and Methods

This systematic review was conducted according to the guidelines in the Cochrane Handbook for Systematic Reviews of Interventions29. We reported the review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines30. The protocol was published31 and registered in the PROSPERO (registration number, CRD42018102939).

Data sources and search strategy

This method was carried out as described previously31. Two researchers (CY Kwon and B Lee) performed independent comprehensive searches of the following 13 databases: 6 English-language databases (MEDLINE via PubMed, EMBASE via Elsevier, the Cochrane Central Register of Controlled Trials [CENTRAL], the Allied and Complementary Medicine Database [AMED] via EBSCO, the Cumulative Index to Nursing and Allied Health Literature [CINAHL] via EBSCO, and PsycARTICLES via ProQuest), 5 Korean-language databases (Oriental Medicine Advanced Searching Integrated System [OASIS], Koreanstudies Information Service System [KISS], Research Information Service System [RISS], Korean Medical Database [KMbase], and Korea Citation Index [KCI]), and 2 Chinese-language databases (China National Knowledge Infrastructure [CNKI] and Wanfang Data). The initial search date was July 31, 2018 and we conducted another search for updated information on July 2, 2019 to provide more up-to-date and comprehensive evidence. We identified additional trials using the reference lists of relevant papers and a manual search on Google Scholar. In addition to peer-reviewed articles in scientific journals, we included grey literature such as degree theses and conference proceedings. There was no restriction on language. The following search terms were used in MEDLINE: (“depressive disorder” [MeSH Terms] OR “depression” [MeSH Terms] OR depressive OR depression) AND (“stroke” [MeSH Terms] OR stroke) AND (Chai-Hu-Jia-Long-Gu-Mu-Li-Tang OR Chai-Hu-Jia-Long-Gu-Mu-Li-Wan OR Chai-Hu-Jia-Long-Gu-Mu-Li-Pian OR Saikokaryukotsuboreitou OR Saikokaryukotsuborito OR Sihogayonggolmoryeo-tang) (Supplemental Digital Content 1, which describes the details of search terms used in all databases.

Inclusion criteria

Types of studies

This method was carried out as described previously31. We included only RCTs, and excluded quasi-RCTs using inappropriate random sequence generation methods. Studies using the expression “randomization” (随机) without descriptions of randomization methods were included. We included both parallel and crossover studies. In crossover designs, only first-phase data were used to calculate the effect size and in the meta-analysis. Other designs such as in vivo, in vitro, case reports, retrospective studies, and non-randomized controlled trials were excluded.

Participant characteristics

This method was carried out as described previously31. We included studies on patients diagnosed with depression following stroke using standardized diagnostic tools such as the DSM-5, regardless of sex, age, or race. Studies were excluded if the participants had drug allergies or other serious illnesses such as cancer, liver disease, or kidney disease.

Intervention types

This method was carried out as described previously31. We included studies using SGYMT, i.e. 11 kinds of herbs including Bupleuri Radix, Pinelliae Rhizoma, Ramulus Cinnamomi, Poria, Scutellariae Radix, Jujubae Fructus, Ginseng Radix or Codonopsis Radix, Ostreae Concha, Fossilia Ossis Mastodi, Zingiberis Rhizoma Recens, and Rhei Rhizoma. Given that HMs, such as SGYMT, are also known as so-called “modified HM,” which allow some modifications of their compositions to achieve increased efficacy32,33,34, we also included studies using modified SGYMT, which was defined in this review as SGYMT containing more than 50% of the original prescription composition (i.e. HM designated as “modified SGYMT”, which contained 6 or more of the 11 basic components). We allowed the use of any form of SGYMT. Studies combining SGYMT with other therapies as treatment interventions were included, if the other therapies were used equally in both the treatment and control groups. For the control intervention, we included studies that used placebos, no treatment, and conventional medical treatments. We excluded studies using HM as the control intervention because these studies could not yield the net effect of SGYMT. There were no other restrictions regarding the control intervention.

Outcome measures

This method was carried out as described previously31. The primary outcome measures were (1) post-treatment value in the degree of depression measured by the Hamilton Depression Scale (HAMD)35 or Beck Depression Inventory (BDI)36 and (2) AEs measured by the Treatment Emergent Symptom Scale (TESS)37 or the incidence. The secondary outcome measures included total effective rate (TER), a non-validated outcome measure that is processed secondarily according to certain evaluation criteria such as clinical symptom improvement, or the improvement rates of other quantified outcomes. In the assessment of TER, participants are generally classified as “cured”, “markedly improved”, “improved”, or “non-responder” after treatment. TER is calculated consistently using the following formula: TER = N1 + N2 + N3/N, where N1, N2, N3, and N are the number of patients who are cured, markedly improved, improved, and the total sample size, respectively. We also evaluated post-treatment value in neurological function by the National Institutes of Health Stroke Scale (NIHSS), a tool used to quantify stroke-related impairment38, measured ADL by the Barthel index, a tool used to describe ADL and mobility39, and measured the quality of life by the 36-Item Short Form Health Survey, a patient-reported survey of their own health40 as secondary outcome measures.

Study selection

After removing duplicates, two researchers (CY Kwon and B Lee) independently screened the titles and abstracts of all searched studies for relevance and then evaluated the full texts of the eligible studies for final inclusion. Any disagreement about study selection was resolved through discussion with other researchers, as previously reported31.

Data extraction

This method was carried out as described previously31. Two researchers (CY Kwon and B Lee) independently performed and crosschecked the data extraction using a standardized data collection form (Excel 2007, Microsoft, Redmond, WA, USA). Discrepancies were resolved through discussion with other researchers. The extracted items included the first author’s name; year of publication; country; sample size and number of dropouts; details about the participants, HM, control intervention, and comparisons; duration of the intervention; outcome measures; and AEs associated with interventions. We contacted the corresponding authors of the included studies by e-mail to request additional information if the data were insufficient or ambiguous.

Quality assessment

This method was carried out as described previously31. Two researchers (CY Kwon and B Lee) independently assessed the methodological quality of all included studies, and the quality of evidence for each main finding. We resolved discrepancies through discussion with other researchers.

The methodological quality of the included studies was evaluated using both the Cochrane Collaboration’s risk of bias tool41 and the Jadad scale42. Using the Cochrane risk of bias tool, the following domains were assessed: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessments, incomplete outcome data, selective reporting, and other potential biases for each included study. Each domain was categorized into one of three groups: “low risk,” “unclear,” or “high risk.” In the random sequence generation domain, we assessed a study as high risk of bias if the expression “randomization” was mentioned without a description of randomization methods. We assessed other potential sources of bias with particular emphasis on possible baseline imbalances arising from a priori selection characteristics for treatment and control groups, such as mean participant age, or baseline depression level. Baseline imbalance arising from selection characteristics that are strongly related to outcome measures may bias the estimation of intervention effects in RCTs41. When using the Jadad scale, randomization method, blinding, and descriptions of withdrawals and dropouts are assessed, and the total score is presented on a scale of 1–5.

The quality of evidence for each main outcome was evaluated by using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach43. Using the online program GRADEpro (https://gradepro.org/), we assessed the risk of bias; inconsistency, indirectness, and imprecision of the results; and the probability of publication bias using a four-item scale (“Very low”, “Low”, “Moderate”, or “High”).

Data synthesis and analysis

This method was carried out as described previously31. We used Review Manager version 5.3 software (Cochrane, London, UK) for data synthesis and analysis. Descriptive analyses of details of the participants, interventions, and outcomes were conducted for all included studies. Meta-analysis was performed for studies using the same types of intervention, comparison, and outcome measure. We pooled continuous outcomes as the mean difference (MD) with 95% confidence intervals (CIs), and dichotomous outcomes as a risk ratio (RR) with 95% CIs. Heterogeneity of effect measures between studies was assessed using both the chi-squared test and the I-squared statistic (I2). We considered I2 values greater than 50% and 75% indicative of substantial and high heterogeneity, respectively. In the meta-analyses, a random-effects model was used when the heterogeneity was significant (I2 > 75%), while a fixed-effects model was used when the heterogeneity was non-significant. We planned to do this; however, during the review process we learned that this practice was no longer supported and that a random-effects model was preferable because of given potential heterogeneity in true treatment effects due to differences in the treatment components, research groups, and patient selection criteria among the included studies. Therefore, we reported both the results of the models that were pre-registered and those of potentially more appropriate random-effects models. However, we used only fixed-effects models when the number of studies included in the meta-analysis was less than 5, in which the estimates of between-study variance had poor accuracy44,45. If the necessary data were available, we conducted a subgroup analysis to account for the heterogeneity or to assess whether the treatment effects vary between subgroups according to the following criteria: (1) the treatment period; (2) the dosage form of SGYMT, such as decoctions or granules; (3) the presence or absence of a placebo; (4) the severity of depression; and (5) the types of antidepressants used. In addition, we performed sensitivity analyses to identify the robustness of meta-analysis results by excluding (1) studies with high risks of bias (2), studies with missing data, and (3) outliers that are numerically distant from the rest of the data. If more than 10 trials were included in the meta-analysis, reporting biases such as publication bias were assessed using funnel plots. When reporting bias was implied by funnel plot asymmetry, we attempted to explain possible reasons for this. Additionally, we used Egger’s linear regression analysis and Begg and Mazumdar’s rank correlation analysis to assess publication bias with Stata/MP version 15.1 software46,47.

Results

Description of included studies

We identified a total of 101 records through database searching. After screening of titles and abstracts, 38 articles were considered to be relevant. Among them, 1 review article, 4 non-RCTs or quasi-RCTs, 5 not describing the diagnostic criteria of PSD, and 7 not describing the contents of conventional medication prescribed were excluded by reviewing the full-texts. In total, 21 RCTs with 1,644 participants were included in this review and meta-analysis (Fig. 1)48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68.

Figure 1
figure 1

PRISMA flow chart of the study selection process. Moher, D. et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 2009; 6(7)e1000097.

The general characteristics of the included studies are summarized in Table 1. All RCTs were conducted in China. One was a thesis50, 1 was a conference proceedings48, and the remaining 19 were journal articles. Thirteen RCTs compared SGYMT to antidepressants48,49,50,51,52,53,54,55,56,57,58,59,60, and the other 8 compared SGYMT combined with antidepressants to antidepressants alone61,62,63,64,65,66,67,68. We were unable to find any placebo-controlled trials. Sample sizes ranged from 48 to 165 with a median of 70, and treatment periods ranged from 14 to 90 days with a median of 42 days. Five studies48,52,59,60,64 recruited participants with specific traditional Chinese medicine (TCM) symptom patterns; this approach enables individual treatment by categorizing the signs and symptoms of patients into a series of syndrome concepts69: four48,52,60,64 were associated with stagnation of the liver or qi, and the remaining one59 was a liver-kidney yin deficiency. As control interventions, a total of three types of antidepressants were used: selective serotonin reuptake inhibitors in nine50,52,53,54,56,57,60,63,68, TCA in three49,55,58, and flupentixol/melitracen in nine48,51,59,61,62,64,65,66,67. In most cases, routine care for stroke (RCS) using pharmaceutical anti-platelet, anti-coagulation, and neurotrophic agents, and vasodilators, was performed for both groups. In one study53, psychotherapy was performed with the RCS for both groups. The most frequently used outcome was TER in 18 studies48,49,50,51,52,54,55,56,57,58,59,60,61,62,63,64,65,67, followed by HAMD in 1548,50,51,52,53,56,57,59,61,62,63,64,65,66,67, NIHSS in 651,53,61,65,67,68, Barthel index in 450,53,60,62, and China Stroke Scale (CSS) in 350,52,57. Seven different calculation methods of TER were used, and among them, 1348,49,50,51,52,54,55,56,57,58,60,62,63 calculated TER based on HAMD, 350,57,60 based on stroke scale, i.e. NIHSS or CSS, 265,67 based on both depression and stroke scale, i.e. HAMD and NIHSS, 259,61 based on clinical symptoms, and the remaining 1 study64 was based on both the clinical symptoms and the TCM symptom score. Two studies reported the approval of institutional review board (IRB)51,68, and 11 studies reported that they had received consent from the participants51,52,53,56,59,60,62,64,66,67,68.

Table 1 Characteristics of included studies.

Methodological quality

Based on analysis using the Cochrane risk of bias tool, eight studies48,51,52,54,58,63,66,68 using appropriate methods of random sequence generation, such as computerized random number tables, were considered to have a low risk of bias on the random sequence generation domain. The remaining 13 studies49,50,53,55,56,57,59,60,61,62,64,65,67 were considered to have a high risk of bias because they did not describe their random sequence generation methods. No studies reported allocation concealment, or blinding of participants, personnel, and outcome assessors. The domain of participant and personnel blinding was rated as a high risk of bias in all studies, given that no study used placebos. For 2 studies that reported dropout54,58, the domains of incomplete outcome data were rated as low and high risk of bias respectively, according to the processing method for missing data that was intent-to-treat analysis54, or per-protocol analysis58. None of the included RCTs had published study protocols. Four studies that reported only TER as an outcome49,54,55,58, 1 that did not report the result of outcomes that were nonetheless described in the Methods section59, 1 that assessed HAMD but did not report the raw data60, and 1 that did not report depression-related outcomes68, were rated with a high risk of bias in the selective reporting domain. Although we contacted the corresponding authors of 2 of these studies via e-mail to obtain raw data54,60, we received no replies. All studies reported no significant baseline difference in demographic data between the two groups, and were rated as having low risk of bias in the other potential sources of bias domains (Figs 2 and 3). Based on the Jadad scale, the mean score was 2.38 (SD 0.50); 8 studies48,51,52,54,58,63,66,68 had a total score of 3 and 1349,50,53,55,56,57,59,60,61,62,64,65,67 had a total score of 2 (Table 1 and Supplemental Digital Content 2).

Figure 2
figure 2

Risk of bias graph for all included studies.

Figure 3
figure 3

Risk of bias summary for all included studies. Low, unclear, and high risk, respectively, are represented with the following symbols: “+”, “?”, and “−”.

Details of SGYMT administration

The decoction dosage form was used in all studies except for 2 using granules48,60. Except for 2 that did not report medication frequency55,58, 19 studies instructed patients to take prescriptions twice a day. Twenty-five types of herb were used in addition to 12 types of basic component. Except for Ginseng Radix (28.57%) used as a substitute for Codonopsis Radix, the remaining 11 basic herbs were used at 61.90–100% frequency in included studies. In particular, Bupleuri Radix, Pinelliae Rhizoma, and Fossilia Ossis Mastodi were used in all studies (all, 100%), and Poria and Ostreae Concha were used in 20 studies (both, 95.24%). The 25 additional herbs showed 4.76–42.86% frequency of use depending on the type, among which Curcumae Radix and Glycyrrhizae Radix showed the most frequent with 42.86%, followed by Astragali Radix, Hoelen cum Pini Radix and Angelicae Gigantis Radix at 28.57%, respectively (Supplemental Digital Content 3, which describes the details of SGYMT and herbs added to the original SGYMT formulation).

SGYMT versus antidepressants

Efficacy

The meta-analysis showed that HAMD scores were significantly lower in the SGYMT group (8 studies48,50,51,52,53,56,57,59; MD −2.08, 95% CI −2.62 to −1.53, I2 = 34%) (Fig. 4), and TERs based on depression scale were higher (11 studies48,49,50,51,52,54,55,56,57,58,60; RR 1.11, 95% CI 1.06 to 1.17, I2 = 0%) than corresponding scores in the antidepressants group. Subgroup analysis showed that when the treatment period was longer than 8 weeks, these significant between-group differences disappeared for the depression scales including HAMD (2 studies50,57; MD −0.66, 95% CI −2.11 to 0.78, I2 = 0%), and for TERs based on depression scales (3 studies50,54,57; RR 1.05, 95% CI 0.91 to 1.21, I2 = 0%). To confirm the robustness of these results, sensitivity analyses were performed after excluding low quality RCTs that had 3 or less low risk of bias on the 7 domains of the risk of bias tool. The superior effectiveness of SGYMT demonstrated by the depression scales including HAMD, and the TER, was consistent within 8 weeks of treatment (Supplemental Digital Content 4).

Figure 4
figure 4

Forest plots for comparison of HAMD scores between SGYMT and pharmaceutical antidepressant groups. Subgroup analysis according to (a) treatment period, (b) dosage form, and (c) types of antidepressants. HAMD, Hamilton Depression Scale; SGYMT, Shihogayonggolmoryeo-tang.

The neurological functions evaluated by TER based on stroke scale (3 studies50,54,57; RR 1.31, 95% CI 1.15 to 1.49, I2 = 89%), NIHSS (2 studies51,53; MD −0.84, 95% CI −1.40 to −0.29, I2 = 19%), and CSS (3 studies50,52,57; MD −5.37, 95% CI −6.60 to −4.15, I2 = 43%), and the ADL evaluated by the Barthel index (3 studies50,52,60; MD 4.30, 95% CI 2.04 to 6.57, I2 = 66%) all showed significantly better results in the SGYMT group. In the subgroup analysis, the significant difference between the two groups for TER based on stroke scale disappeared when the treatment period was shorter than 4 weeks (1 study60; RR 1.06, 95% CI 0.94 to 1.19) and NIHSS (1 study53; MD −0.37, 95% CI −1.37 to 0.63) (Supplemental Digital Content 5, showing forest plots for other outcomes compared between the SGYMT and antidepressant groups).

Wang and Li56 and Huang et al.51 reported modified Edinburgh-Scandinavian stroke scales and functional independence measures respectively as their outcomes, with the SGYMT group showing significantly better results relative to the control group (p < 0.05 for both studies). Moreover, Liu et al.53 reported significantly lower serum levels of interleukin-1β and tumor necrosis factor-α in the SGYMT group after 28 days of treatment (p < 0.05 for both comparisons).

Safety

There were significantly fewer AEs associated with SGYMT (6 studies48,50,51,52,56,59; RR 0.13, 95% CI 0.05 to 0.37, I2 = 0%) than with antidepressants (Fig. 5). In the subgroup analysis, significant differences between these two groups disappeared when the treatment period was longer than 8 weeks (1 study50; RR 0.29, 95% CI 0.07 to 1.21), or when SGYMT was administered as granules (1 study48; RR 0.07, 95% CI 0.00 to 1.13). However, sensitivity analysis performed by excluding low quality RCTs showed no significant difference between two groups when the treatment period was shorter than 4 weeks (1 study48; RR 0.07, 95% CI 0.00 to 1.13) or when the type of antidepressant consisted of flupentixol/melitracen (2 studies48,51; RR 0.07, 95% CI 0.00 to 1.13) (Supplemental Digital Content 4).

Figure 5
figure 5

Forest plots for comparison of adverse events between SGYMT and pharmaceutical antidepressant groups. Subgroup analysis according to (a) treatment period (b) dosage form, and (c) types of antidepressants. SGYMT, Shihogayonggolmoryeo-tang.

SGYMT combined with antidepressants versus antidepressants alone

Efficacy

The meta-analysis showed that the combination therapy group scored significantly lower on the HAMD (7 studies61,62,63,64,65,66,67; MD = −6.72, 95% CI = −11.42 to −2.01, I2 = 98%) (Fig. 6) and NIHSS (4 studies61,65,67,68; MD −3.03, 95% CI −3.60 to −2.45, I2 = 87%), and showed higher TER based on depression scales (3 studies62,63,67; RR 1.66, 95% CI 1.40 to 1.97, I2 = 94%) than did the antidepressants alone group (see Supplemental Digital Content 6, showing forest plots comparing other outcomes between SGYMT plus antidepressants and antidepressants only groups).

Figure 6
figure 6

Forest plot for comparison of HAMD score between the SGYMT plus antidepressants group and the antidepressants alone group. Subgroup analysis according to (a) treatment period and (b) types of antidepressants. HAMD, Hamilton Depression Scale; SGYMT, Shihogayonggolmoryeo-tang.

Interestingly, significant differences in HAMD between treatment groups disappeared when the treatment period was longer than 4 weeks (4 studies62,63,65,67; MD −7.86, 95% CI −16.50 to 0.77, I2 = 99%) (Fig. 6). Sensitivity analysis performed by excluding low quality RCTs showed that the combination treatment was consistently more effective when the treatment lasted less than 4 weeks (1 study66; MD −4.04, 95% CI −6.51 to −1.57). In addition, the extremely high heterogeneity (I2 = 98%) in the HAMD scores was reduced to 0% as a result of the sensitivity analysis performed by excluding low-quality RCTs (Supplemental Digital Content 4).

Liu and Wang65 calculated TER using both depression and stroke scales, and reported that the two groups showed similar efficacies (29/30 for the combination group, 27/30 for the control group, no P-value reported). Lai et al.62 and Liu66 reported the Barthel index and generic quality of life inventory-74 as their outcomes. Using these measures, the combination group showed significantly better results than did the antidepressants alone group (p < 0.05 and p < 0.01, respectively).

Safety

No studies reported outcomes related to safety in this comparison.

Quality of evidence

In the comparison of SGYMT and antidepressants, the qualities of evidence were graded as “Very low” to “Moderate” (Table 2). Meanwhile, in the comparison of SGYMT combined with antidepressants and antidepressants alone, the qualities of evidence were graded as “Very low” to “Moderate” (Table 3). There was no high quality of evidence. The main reason for downgrading was the high risk of bias in the RCTs included in each meta-analysis. In addition, most findings were judged to have low precision because they did not satisfy the optimal sample size and had wide CIs. The indirectness of outcome measure also lowered the quality of evidence.

Table 2 Summary of findings: SGYMT compared with antidepressants.
Table 3 Summary of findings: SGYMT combined with antidepressants versus antidepressants alone.

Publication bias

No evidence of publication bias (distinct asymmetry) emerged from the funnel plots of TER based on depression scales comparing the efficacy of SGYMT with that of antidepressants alone. In addition, publication bias could not be proven using Egger’s method (P value for bias: 0.174) or Begg’s method (continuity corrected Z score: 0.78, continuity corrected P value: 0.436) (Fig. 7).

Figure 7
figure 7

Results of the analysis of publication bias for comparison of TER based on the depression scale between the SGYMT and pharmaceutical antidepressant groups. (a) Funnel plot and (b) Egger’s regression plot. SGYMT, Shihogayonggolmoryeo-tang; TER, total effective rate.

Discussion

This review aimed to evaluate the effectiveness and safety of SGYMT as a monotherapy or adjunctive therapy to antidepressants for PSD. A comprehensive search yielded 21 RCTs that were suitable for inclusion in our review.

The findings of our analysis were as follows: (1) In the comparison between SGYMT and antidepressants, relative to pharmaceutical antidepressants, SGYMT monotherapy significantly alleviated depression measured by HAMD (MD −2.08, 95% CI −2.62 to −1.53, I2 = 34%), and TER based on depression scale (RR 1.11, 95% CI 1.06 to 1.17, I2 = 0%). However, subgroup analysis of treatment periods showed that such differences on HAMD (≤4 weeks: MD −1.98, 95% CI −3.13 to −0.83, I2 = 34%; >4 weeks, ≤8 weeks: MD −2.48, 95% CI −3.04 to −1.93, I2 = 0%) and TER (≤4 weeks: RR 1.11, 95% CI 1.04 to 1.18, I2 = 16%; >4weeks, ≤8weeks: MD 1.21, 95% CI 1.06 to 1.39, I2 = 0%) were only evident for treatment periods shorter than 8 weeks, a result consistent with that of the sensitivity analysis performed after exclusion of low quality RCTs. Additionally, the SGYMT group showed significant improvement of neurological functions evaluated by TER based on stroke scale (RR 1.31, 95% CI 1.15 to 1.49, I2 = 89%), NIHSS (MD −0.84, 95% CI −1.40 to −0.29, I 2 = 19%), and CSS (MD −5.37, 95% CI −6.60 to −4.15, I2 = 43%). Differences that emerged from this comparison were sustained when treatment periods were longer than 4 or 8 weeks, for the TER (>8 weeks: RR 1.80, 95% CI 1.37 to 2.37, I2 = 0%) and NIHSS outcomes measures (>4weeks, ≤8weeks: MD −1.05, 95% CI −1.71 to −0.39). These results suggest that the effectiveness of SGYMT for treatment of PSD has a different time trajectory relative to that of antidepressants. (2) In the comparison between SGYMT combined with antidepressants and antidepressants alone, the combined treatment also significantly improved depression evaluated by HAMD (MD = −6.72, 95% CI = −11.42 to −2.01, I2 = 98%) and TER based on depression scale (RR 1.66, 95% CI 1.40 to 1.97, I2 = 94%); however, the benefits assessed using the HAMD were sustained only for treatment periods shorter than 4 weeks (MD = −5.65, 95% CI = −6.45 to −4.86, I2 = 0%). These results are consistent with comparisons between SGYMT monotherapy and antidepressants, suggesting that SGYMT may alleviate the symptoms of PSD more rapidly than do pharmaceutical antidepressants. Moreover, the combination treatment group showed more marked improvement of neurological function evaluated by NIHSS (MD −3.03, 95% CI −3.60 to −2.45, I2 = 87%) than did the group treated with antidepressants alone. (3) Regarding the safety data, only six RCTs48,50,51,52,56,59, comparing SGYMT with antidepressants reported the incidence of AEs. The SGYMT group showed significantly fewer AEs than did the antidepressants group (RR 0.13, 95% CI 0.05 to 0.37, I2 = 0%), regardless of the types of antidepressants compared (SSRI: RR 0.14, 95% CI 0.03 to 0.65, I2 = 30%; flupentixol/melitracen: RR 0.07, 95% CI 0.01 to 0.53, I2 = 0%). However, this difference disappeared when treatment periods were longer than 8 weeks (RR 0.29, 95% CI 0.07 to 1.21), or when SGYMT was administered as granules (RR 0.07, 95% CI 0.00 to 1.13). Additionally, sensitivity analysis performed by excluding low quality RCTs showed that the significant difference disappeared when treatment period was shorter than 4 weeks. Altogether, these results suggest that SGYMT may be consistently more effective and safer than antidepressants over treatment periods of 4 to 8 weeks. (4) The methodological quality of the included studies and the strength of evidence were generally poor. The Cochrane risk of bias tool showed that only 8 of 21 included trials used and reported appropriate methods of random sequence generation. Moreover, no studies reported allocation concealment; blinding of participants, personnel, and outcome assessors; or use of placebo designs. Moreover, because none of the studies available for our meta-analysis had previously published a study protocol, their results may be selectively reported and/or biased. We also assessed the quality of RCTs included by using the Jadad scale; the mean score was 2.38, which indicated that the quality of the studies included in this review was generally low. The quality of evidence assessed by the GRADE was “Very low” to “Moderate” and there was no “High” quality evidence. It means that the evidence comparing SGYMT and antidepressants would be significantly improved by future additions of high quality research.

Although a definite conclusion could not be drawn due to the low qualities of included studies and the evidence, our findings suggested the following implications of SGYMT use. First, as an alternative or adjunctive therapy, SGYMT might have antidepressant effects especially within the first 4 to 8 weeks of treatments. Second, SGYMT probably improves neurological function and the ADL for PSD patients, which are difficult to be improved with conventional antidepressants17. Third, SGYMT was associated with fewer AEs, especially when administered between 4 and 8 weeks after the start of treatment. However, all these implications are hypothetical and cannot be confirmed by our results.

As a modality of complementary and alternative medicine, HM has been regarded as a potential replacement or supplement for conventional medicine when applied to various pathological conditions including psychiatric disorders such as depression, insomnia, and schizophrenia70,71,72,73. The underlying mechanism by which SGYMT, one of the famous classical herbal medicines, serves as treatment for PSD is not fully understood; however, for some key herbs of SGYMT, relevant underlying mechanisms have been reported. For example, Bupleuri Radix, a key component of the SGYMT prescription, is known to reduce neuro-inflammation74 and oxidative stress75, and increase concentrations of nerve growth factor and brain-derived neurotrophic factor76. All these mechanisms are associated with the etiology of depression. Scutellariae Radix, another key component of this prescription, alleviates depression through several complex molecular mechanisms77, thereby complementing the action of Bupleuri Radix. Some HMs such as Chai Hu Shu Gan San and Xiao Yao San, which include Bupleuri Radix as a key component, have significant therapeutic effects on depression70,78. Other components of SGYMT, including Ginseng Radix, also have antidepressant effects79,80,81. Moreover, the multiple components of HM may exert a complex effect on multiple molecular targets23. Thus, HM such as SGYMT may help to improve neurological symptoms in addition to alleviating the symptoms of depression in PSD patients.

The following limitations should be kept in mind when interpreting the results of this meta-analysis. First, because all studies reviewed were conducted in China, general applicability of the results may be limited. Second, the quality of the included studies is generally low, particularly with respect to the lack of placebo-controlled trials. Therefore, the possibility that our study overestimated the effectiveness of SGYMT cannot be ruled out. The low quality of the included studies implies that the reliability of our results is very low. In other words, our results should be interpreted with great caution considering that they may change markedly according to the results of future rigorous research. Furthermore, the popularity of HM in China may have elevated Chinese participants’ expectations of SGYMT. In studies comparing SGYMT combined with antidepressants with antidepressants alone, participants are likely to have high expectations of the former treatment, possibly increasing the placebo effect. Third, in the comparisons within our protocol we planned a subgroup analysis according to the severity of depression, but this could not be carried out because too few studies included criteria assessing the severity of depression. Fourth, only four of the included studies recruited PSD patients with a specific TCM pattern. The TCM pattern can be used in conjunction with the diagnosis of the disease, thereby so-called “disease-syndrome combination” can be used to fully exploit the advantages of the HM82, which is advantageous for the individual-specific treatment. Finally, in our review, the control groups of the included studies were prescribed antidepressants regardless of their type, which led to distinct clinical heterogeneity. Although we conducted careful subgroup analyses according to each type of drug, the number of studies included was not sufficient to quantify the comparative effect size of SGYMT compared to each type of antidepressant and to explain the heterogeneity adequately.

Suggestions for future research are as follows. Further high-quality RCTs on the efficacy of SGYMT for reducing PSD are needed, particularly in countries other than China, where wide acceptance of HM for the treatment of PSD may positively bias the results of comparisons with pharmaceutical antidepressants. Accordingly, when planning these studies, it is necessary to consider stratified randomization or post-correction that reflects expectations for HM to avoid potential placebo effects. Moreover, placebo-controlled trials are essential to assess the efficacy and safety of SGYMT objectively. To optimize the use of SGYMT in PSD treatment, future studies should characterize participants in greater detail than was possible in our analysis, particularly the severity of their depression, and their TCM patterns. In particular, individual characteristics are an important component of HM practice, so it is necessary to establish a subgroup of PSD patients with personalized medicine profiles suitable for the administration of SGYMT. TCM patterns may be useful in this selection process. Furthermore, it is important to obtain ethical approval from an IRB before conducting clinical research to protect the dignity, rights, and welfare of research participants, which is in line with World Health Organization guidelines83. It is important to explain to the participants the purpose, content, and method of the research, as well as its potential benefits and risks; informed consent should also be obtained from participants in all clinical research studies. In addition, studies using health insurance data in China, Japan, Korea, and Taiwan, where health insurance for HM is applied, may enlarge the database and help specify the indications for SGYMT. Finally, the multi-compound multi-target aspect of HM has potential to contribute to the improvement of both neurological function and depressive symptoms. A comprehensive experimental study of the underlying molecular mechanism of action of SGYMT is needed.

In conclusion, current evidence suggests that SGYMT, either as a monotherapy, or as an adjuvant therapy combined with antidepressants, might have potential benefits for the treatment of PSD. However, since the methodological quality of the included studies was poor and there were no large, placebo-controlled trials to ensure freedom from bias, the results of the meta-analysis may be overestimated; thus, it remains difficult to draw definitive conclusions on this topic. Further well-designed RCTs are needed to confirm these results.