Introduction

Bipolar disorder (BD) is a common chronic mental disorder and a major contributor to the global burden of disease, with a worldwide prevalence of ~1% [1,2,3]. Patients with BD repeatedly and irregularly present mania/hypomania or depression during their lifetimes, which can result in social and occupational disability [4].

Pharmacological treatments are among the primary treatments for BD [4, 5]. The most recent guidelines state that clinicians and patients should take the maintenance phase into account when selecting acute phase treatments [6]. A previous network meta-analysis (NMA) reported that, compared with placebo, lithium and quetiapine reduced the recurrence or relapse rate of any mood, depressive, or manic, hypomanic/mixed episodes [7]. Recently, aripiprazole once monthly (AOM) and asenapine were approved for the treatment of BD [8]. We performed a systematic review and NMA of the efficacy, tolerability, and safety of antipsychotics and/or mood stabilizers, and we conducted a risk-benefit analysis of each medication for patients with BD in the maintenance phase.

Methods

This study was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA Checklist) [9] and was registered on Open Science Framework (https://osf.io/h4nwp). The literature search, data extraction, and data input into spreadsheets for analysis were performed simultaneously and independently by at least two authors (TK, TI, YM, KS, and MO). The authors double-checked the accuracy of data transfer and calculations in the study.

Search strategy and inclusion criteria

The information about the literature search is shown in Supplementary Fig. 1. Inclusion criteria were (1) randomized controlled trials (RCTs) of antipsychotics and/or mood stabilizers lasting at least 12 weeks; (2) studies including adult patients with any BD subtype in the maintenance phase; (3) studies including patients with any mood symptoms at recruitment; (4) open studies and those with any level of blinding; and (5) studies with/without an enrichment designs. Exclusion criteria were (1) studies with child/ adolescent patients with BD; (2) continuation studies which randomly assigned patients with acute symptoms to treatment groups; (3) monotherapy and/or combination therapy studies of antidepressants with mood stabilizers or antipsychotics.

Data synthesis and outcome measures

The primary outcome was recurrence/relapse rate of any mood episode. Secondary outcomes were recurrence/relapse rate of depressive episodes, recurrence/relapse rate of manic/hypomanic/mixed episodes, all-cause discontinuation, and discontinuation rate due to adverse events. Other outcomes were mortality rate and incidence of individual adverse events. Divalproex was classified as part of the valproate group. Definitions of recurrence/relapses are shown in Supplementary Table 1.

Data extraction

We analyzed the extracted data based on intention-to-treat or modified intention-to-treat principles. When data required for meta-analysis were missing in the articles, we searched for these data in published systematic review articles. Although we attempted to contact the original study investigators to obtain unpublished data, we did not succeed in obtaining these data from all of them.

Meta-analysis methods

Based on the results of our literature search (Supplementary Fig. 1 and Supplementary Table 1), we planned to perform two categorical NMAs. The first included (1) placebo-controlled and head-to-head trials of monotherapy of antipsychotics and/or mood stabilizers, and (2) combination or augmentation studies in which the two drugs used were specified. The second NMA included studies in which second-generation antipsychotics (SGAs) combined with lithium or valproate (LIT/VAL) were compared with placebo-LIT/VAL. A Bayesian NMA based on random-effects models [10] was conducted using the netmeta package [11]. We fitted random-effects frequentist NMAs, in which we assumed a common random-effects standard deviation for all comparisons in the network. The risk ratio (RR) and 95% credible interval (95% CI) were calculated. The heterogeneity standard deviation was also calculated for all outcomes. The odds ratios and their 95% CIs were calculated for mortality rate and completed suicide rate because incidences of these outcomes were very rare (Supplementary Appendix 1.6–1.7). We assessed network heterogeneity using τ2 with the netmeta package. We conducted a statistical evaluation of consistency using the design-by-treatment test (globally) and the node-splitting approach or Separate Direct from Indirect Evidence test (locally). The Bayesian analyses also estimated rank probabilities (i.e., probability of each treatment obtaining each possible rank as shown by their relative effects). The surface under the cumulative ranking area was calculated to rank the interventions. We also performed a meta-regression analysis in the first NMA to examine whether some potentially confounding factors (e.g., publication year, duration of study, number of total patients, percent female, and mean age) were associated with the extent of effect on primary and secondary outcomes. In addition to the analyses conducted previously [7], we also performed sensitivity analyses for primary and secondary outcomes in the first NMA, in which we gave only half the weight to (1) studies that included both patients with bipolar disorder I (BDI) and with other BD (when focusing on studies including only patients with BDI); (2) studies that included rapid-cycling patients with BD (when focusing on studies including only non-rapid-cycling patients with BD because rapid-cycling BD is considered to be more difficult to stabilize than non-rapid-cycling BD); (3) non-double-blind studies (when focusing on double-blind studies); (4) study arms that were “enriched” (when focusing on nonenriched studies); and (5) study arms supported by industry sponsors (when focusing on non-industry sponsorship studies) [12]. We did not perform meta-regression and sensitivity analyses in the second NMA because only six studies were included. In addition, the methodological quality of the included articles was assessed according to the Cochrane Risk of Bias criteria [13]. Funnel plots were used to explore potential publication bias. Lastly, we incorporated results into the Confidence in Network Meta-Analysis (CINeMA) application to assess the credibility of findings from each NMA [14]. CINeMA grades the confidence in results of each treatment comparison as high, moderate, low, or very low.

Results

Study characteristics

A flow diagram of the literature search is shown in Supplementary Fig. 1. We eliminated 2724 articles based on a review of the abstract and/or title. A review of the full texts of the remaining 59 articles resulted in the elimination of a further 21 articles. This left 38 included in the analysis [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]. Three additional studies [53,54,55] were identified following a manual search through the reference lists of the previous review article [7]. No further studies were found in the clinical trial registers. Although two studies included antidepressant treatment arms [15, 28], these studies were included in the NMA because they had both lithium arm and placebo arm. Hence, 41 studies, including a total of 9821 patients, with mean study duration of 70.5 ± 36.6 weeks, were identified and included in this study. Characteristics of these studies are shown in Supplementary Table 1. The percent female was 54.1%, and the mean age was 40.7 years. Twenty-three studies included only patients with BDI. Just four studies included only patients who had depressive episodes at recruitment. Sixteen studies included patients with rapid-cycling BD and 25 studies used enrichment designs. One perphenazine study [51] and two risperidone long-acting injection (RISLAI) studies [43, 44] were not included in the NMA because no arms of the study connected to the treatment arms of other studies [51]. Detailed methodological quality analyses of the studies based on the Cochrane Risk of Bias criteria are presented in Supplementary Fig. 2. Three studies were open-label studies [27, 30, 43]. Twenty-nine studies were industry-sponsored studies. Supplementary Tables 2.1 and 2.2 show the results of primary outcome in the individual study included in our systematic review.

AOM, aripiprazole, aripiprazole+lamotrigine, aripiprazole+valproate, asenapine, carbamazepine, lamotrigine, lamotrigine+valproate, lithium, lithium+oxcarbazepine, lithium+valproate, olanzapine, paliperidone, quetiapine, RISLAI, valproate, and placebo arms were included in the first NMA (32 studies and 7113 patients). Aripiprazole, lurasidone, quetiapine, olanzapine, or ziprasidone combined with LIT/VAL and LIT/VAL arms were included in the second NMA (6 studies and 2498 patients).

Results of the first network meta-analysis

Results of the first NMA are shown in Supplementary Appendix 1.1–1.17.

Primary and secondary outcomes

AOM, aripiprazole, aripiprazole+lamotrigine, aripiprazole+valproate, asenapine, lamotrigine, lithium, lithium+oxcarbazepine, lithium+valproate, olanzapine, quetiapine, RISLAI, and valproate outperformed placebo for recurrence/relapse rate of any mood episode (Table 1, Fig. 1). The RR (95% CI) for drugs that significantly lowered recurrence/relapse rates of any mood episode ranged from 0.262 (0.133–0.517) for asenapine to 0.764 (0.628–0.930) for lamotrigine (29 RCTs, 6890 patients; Table 1, Fig. 1). Asenapine outperformed aripiprazole, carbamazepine, lamotrigine, lithium, paliperidone, RISLAI, and valproate. Aripiprazole+valproate, olanzapine, and quetiapine outperformed lamotrigine and paliperidone (Table 1).

Table 1 Head-to-head comparisons for recurrence/relapse of any mood episode.
Fig. 1: Recurrence/relapse rate of any mood episode.
figure 1

Drugs were compared with placebo. To visualize heterogeneity, we used prediction intervals in the forest plot. The confidence level estimated by CINeMA is shown next to 95% PI (L: low, M: moderate, VL: very low). 95% CI: 95% credible interval, 95% PI: prediction interval, CR: confidence rating, RR: risk ratio. AOM aripiprazole once monthly, ARI aripiprazole, ASE asenapine, CAR carbamazepine, LAM lamotrigine, LIT lithium, OLA olanzapine, OXC oxcarbazepine, PAL paliperidone, QUE quetiapine, RISLAI risperidone long-acting injectable, VAL valproate.

Aripiprazole+valproate, lamotrigine, lamotrigine+valproate, lithium, olanzapine, and quetiapine outperformed placebo for recurrence/relapse rate of depressive episodes, with RR (95% CI) ranging from 0.273 (0.076–0.986) for aripiprazole+valproate to 0.791 (0.660–0.948) for lithium (25 RCTs, 6438 patients; Fig. 2a). Aripiprazole+valproate outperformed carbamazepine, paliperidone, and RISLAI. Lamotrigine outperformed paliperidone and RISLAI. Lamotrigine+valproate outperformed AOM, carbamazepine, paliperidone, and RISLAI. Lithium and olanzapine outperformed RISLAI. Quetiapine outperformed AOM, carbamazepine, lamotrigine, lithium, olanzapine, paliperidone, RISLAI, and valproate.

Fig. 2: Recurrence/relapse rate of depressive episodes and manic/hypomanic/mixed episodes.
figure 2

a Recurrence/relapse rate of depressive episodes. b Recurrence/relapse rate of manic/hypomanic/mixed episodes. Drugs were compared with placebo. To visualize heterogeneity, we used prediction intervals in the forest plot. The confidence level estimated by CINeMA is shown next to 95% PI (L: low, M: moderate, VL: very low). 95% CI: 95% credible interval, 95% PI: prediction interval, CR confidence rating, RR risk ratio. AOM aripiprazole once monthly, ARI aripiprazole, ASE asenapine, CAR carbamazepine, LAM lamotrigine, LIT lithium, OLA olanzapine, OXC oxcarbazepine, PAL paliperidone, QUE quetiapine, RISLAI risperidone long-acting injectable, VAL valproate.

All active treatments other than aripiprazole+valproate, carbamazepine, lamotrigine, and lamotrigine+valproate outperformed placebo for recurrence/relapse rate of manic/hypomanic/mixed episodes, with RR (95% CI) ranging from 0.208 (0.082–0.529) for asenapine to 0.640 (0.477–0.857) for valproate (25 RCTs, 6438 patients; Fig. 2b). AOM outperformed lamotrigine and valproate. Asenapine outperformed carbamazepine, lamotrigine, lithium, paliperidone, quetiapine, and valproate. Lithium outperformed lamotrigine. Lithium+valproate outperformed lamotrigine and valproate. Olanzapine outperformed lamotrigine, lithium, paliperidone, quetiapine, and valproate. Quetiapine outperformed lamotrigine. RISLAI outperformed lamotrigine, lithium, quetiapine, and valproate.

Asenapine, lithium, olanzapine, quetiapine, and valproate were associated with lower all-cause discontinuation compared with placebo, with RR (95% CI) ranging from 0.450 (0.270–0.750) for asenapine to 0.837 (0.725–0.966) for lithium (29 RCTs, 6988 patients; Fig. 3a). Asenapine outperformed aripiprazole, carbamazepine, lamotrigine, lithium, paliperidone, and valproate. Quetiapine was outperformed by carbamazepine.

Fig. 3: All-cause discontinuation and discontinuation rate due to adverse events.
figure 3

a All-cause discontinuation. b Discontinuation rate due to adverse events. Drugs were compared with placebo. To visualize heterogeneity, we used prediction intervals in the forest plot. The confidence level estimated by CINeMA is shown next to 95% PI (L: low, M: moderate, VL: very low). 95% CI: 95% credible interval, 95% PI: prediction interval, CR confidence rating, RR risk ratio. AOM aripiprazole once monthly, ARI aripiprazole, ASE asenapine, CAR carbamazepine, LAM lamotrigine, LIT lithium, OLA olanzapine, OXC oxcarbazepine, PAL paliperidone, QUE quetiapine, RISLAI risperidone long-acting injectable, VAL valproate.

Only asenapine was associated with a lower discontinuation due to adverse events compared with placebo, with RR (95% CI) 0.363 (0.162–0.812) (21 RCTs, 6107 patients; Fig. 3b). Lithium and lithium+valproate were associated with higher discontinuation due to adverse events compared with placebo, with RR (95% CI) 2.238 (1.430–3.502) and 3.651 (1.234–10.801), respectively (21 RCTs, 6107 patients; Fig. 3b). Asenapine outperformed AOM, aripiprazole, carbamazepine, lithium, lithium+oxcarbazepine, lithium+valproate, olanzapine, quetiapine, and RISLAI. Lamotrigine outperformed AOM, aripiprazole, carbamazepine, lithium, and lithium+valproate. Quetiapine outperformed lithium. Valproate outperformed lithium+valproate.

Aripiprazole+valproate ranked first for reduction of the recurrence/relapse rate of any mood episode and depressive episodes. Asenapine was selected the best drug for reducing manic/hypomanic/mixed episodes and discontinuation due to adverse events. Lithium+valproate had the least incidence of all-cause discontinuation. Supplementary Appendix 2.1–2.3 shows two-dimensional graphs of the primary and secondary outcomes.

Meta-regression analysis of primary and secondary efficacy outcomes

A significant association between the extent of effect on the recurrence/relapse rate of manic/hypomanic/mixed episodes and the duration of study was detected (beta = –0.497; 95% CI = –0.985, –0.004; p < 0.001). The heterogeneity variance of the meta-regression analysis was reduced by 21% compared with the unadjusted analysis. Although the unadjusted analysis demonstrated that aripiprazole, aripiprazole+lamotrigine, and paliperidone outperformed placebo in the recurrence/relapse rate of manic/hypomanic/mixed episodes, these differences were not statistically significant in the meta-regression analysis. We did not find any associations between the extent of effect in primary and other secondary outcomes and potentially confounding factors (Supplementary Appendix 1.1–1.5).

Sensitivity analyses for primary and secondary outcomes

Relative reduction in heterogeneity variance for recurrence/relapse of any mood episodes for sensitivity analyses focusing on studies including only non-rapid-cycling patients with BD, nonenriched studies, and those not sponsored by industry were 29%, 21%, and 29%, respectively (Supplementary Appendix 1.1). Although outcomes with aripiprazole and aripiprazole+valproate were superior to placebo in the unadjusted analysis, the results did not reach statistical significance in the sensitivity analyses. The results of other comparisons for this outcome in the unadjusted and sensitivity analyses were similar. We did not detect relative reductions in heterogeneity variance for other outcomes in any of the sensitivity analyses (Supplementary Appendix 1.2–1.5).

Mortality rate and incidence of individual adverse events

Mortality and completed suicide rates were low and similar for all treatments. Aripiprazole was associated with a higher incidence of extrapyramidal symptoms/use of anticholinergic agents compared with carbamazepine. Lithium was associated with a higher incidence of extrapyramidal symptoms/use of anticholinergic agents compared with placebo, carbamazepine, lamotrigine, olanzapine, and quetiapine. Valproate was associated with a higher incidence of extrapyramidal symptoms/use of anticholinergic agents compared with placebo, carbamazepine, lamotrigine, and quetiapine. Olanzapine was associated with a higher incidence of somnolence compared with placebo, lamotrigine, and lithium. Olanzapine and quetiapine were associated with a lower incidence of insomnia compared with placebo, lamotrigine, and lithium. RISLAI was associated with a higher incidence of prolactin-related adverse events compared with placebo. Lithium was associated with a higher incidence of dry mouth compared with valproate, and quetiapine was associated with a higher incidence of dry mouth compared with placebo and valproate. Lamotrigine, lithium, olanzapine, quetiapine, valproate, and placebo were associated with a higher incidence of headache compared with RISLAI. Valproate was associated with a higher incidence of headache compared with AOM. Lamotrigine was associated with a higher incidence of nausea compared with quetiapine. Lithium was associated with a higher incidence of nausea compared with placebo, olanzapine, and quetiapine. Valproate was associated with a higher incidence of nausea compared with placebo and quetiapine. Lithium was associated with a higher incidence of diarrhea compared with placebo and lamotrigine.

Heterogeneity, inconsistency, and results of the first network meta-analysis graded using the CINeMA system

Global heterogeneity was low to moderate for most outcomes other than insomnia, dry mouth, and increased weight (Supplementary Appendix 1.1–1.17). We also did not detect considerable heterogeneities for most of the outcomes in certain comparisons (Supplementary Appendix 1.1–1.17). We did not find significant global inconsistencies in the primary and secondary outcomes. Percent inconsistency loops in the recurrence/relapse of any mood episode, depressive episodes, manic/hypomanic/mixed episodes, all-cause discontinuation, and discontinuation due to adverse events were: 0%, 13.6%, 9.1%, 0%, and 0%, respectively. However, we detected global inconsistency in insomnia and increased weight. We did not analyze global inconsistencies in prolactin-related adverse events and dry mouth due to insufficient data. Funnel plots with fewer than ten studies might not be meaningful. The confidence in evidence was often low or very low.

Results of the second network meta-analysis

Results of the second NMA are shown in Supplementary Appendix 3.1–3.11. Aripiprazole+LIT/VAL, lurasidone+LIT/VAL, quetiapine+LIT/VAL, and ziprasidone+LIT/VAL were superior to placebo+LIT/VAL in the recurrence/relapse rate of any mood episode. Moreover, lurasidone+LIT/VAL and quetiapine+LIT/VAL were superior to olanzapine+LIT/VAL. Lurasidone+LIT/VAL and quetiapine+LIT/VAL were superior to placebo+LIT/VAL in the recurrence/relapse rate of depressive episodes, and lurasidone+LIT/VAL and quetiapine+LIT/VAL were superior to aripiprazole+LIT/VAL and ziprasidone+LIT/VAL. Aripiprazole+LIT/VAL and quetiapine+LIT/VAL were superior to placebo+LIT/VAL in the recurrence/relapse rate of manic/hypomanic/mixed episodes, and lurasidone+LIT/VAL and quetiapine+LIT/VAL were associated with lower all-cause discontinuation compared with placebo+LIT/VAL. Quetiapine+LIT/VAL was associated with a higher incidence of somnolence compared with placebo+LIT/VAL. Olanzapine+LIT/VAL and quetiapine+LIT/VAL were associated with a lower incidence of insomnia compared with placebo+LIT/VAL. Olanzapine+LIT/VAL and quetiapine+LIT/VAL were associated with a higher incidence of increased weight compared with placebo+LIT/VAL and aripiprazole+LIT/VAL. We did not examine local heterogeneity, and global and local inconsistency for any outcomes in the second NMA due to insufficient data. The confidence in evidence of the second NMA was very low.

Discussion

We performed a systematic review and NMAs of efficacy, acceptability, tolerability, and safety for mono- or combination therapies using mood stabilizers and/or antipsychotics in the treatment of adult patients with BD in the maintenance phase. We extended a previous NMA by two SGAs (i.e., asenapine and AOM), by investigating many more adverse effects and by examining efficacy and safety of various combination therapies using SGA and LIT/VAL [7]. Overall, most of the mood stabilizers and/or antipsychotics reduced the recurrence/relapse rates of any mood episode. However, when examining individual mood symptoms, both drug types appeared to be more effective for treating mania than depression.

Aripiprazole+valproate was the best treatment for reducing the recurrence/relapse rates of any mood episode and depressive episodes. However, these significances disappeared during sensitivity analyses adjusting for enrichment design and sponsorship. Lithium+oxcarbazepine ranked high with respect to reducing the recurrence/relapse rates of any mood episode (2nd), depressive episodes (2nd), and manic/hypomanic/mixed episodes (3rd). Lamotrigine+valproate ranked third for reducing the recurrence/relapse rate of depressive episodes. However, these results were based on only one small study (<50 patients in each treatment arm). Lithium+valproate ranked first for all-cause discontinuation, based on the results of a single open-label study. We deemed the result inconclusive, given the CINeMA rating showed low and very low confidence levels for these treatments.

Asenapine ranked high with respect to reducing the recurrence/relapse rates of any mood episode (3rd), manic/hypomanic/mixed episodes (1st), all-cause discontinuation (3rd), and discontinuation due to adverse events (1st), which might represent novel insights into the pharmacological treatment of patients with BD in the maintenance phase. Although it did not prevent recurrence/relapse of depressive episodes, asenapine ranked fifth for outcome. It should be noted that this ranking was made from only one 26-week, double-blind, randomized, placebo-controlled trial of asenapine. Furthermore, asenapine carries the risk of oral hypoesthesia [34], and this distinctive side effect makes it difficult to blind [13]; the asenapine study might therefore be subject to performance and detection biases.

Olanzapine and quetiapine outperformed placebo in all efficacy outcomes and all-cause discontinuation. Quetiapine results should be interpreted with caution because all the quetiapine studies included in our meta-analysis used enrichment designs and were industry sponsored. However, sensitivity analyses adjusting for these factors demonstrated that quetiapine outperformed placebo in all efficacy outcomes. Thus, olanzapine and quetiapine showed good efficacy and acceptability in adult patients with BD in the maintenance phase. However, olanzapine and quetiapine carry a risk of somnolence and dry mouth, respectively. The second NMA demonstrated that combination therapies of these SGAs with LIT/VAL also carried the risk of increased weight.

Recent treatment guidelines recommend lithium as a first-line drug for the treatment of adult patients with BD in the maintenance phase [6, 56, 57]. The numbers of studies and patients treated with lithium were the largest among the active drugs included in our study (19 studies and 1335 patients). A recent meta-review including RCTs and non-RCTs reported that lithium had anti-suicidal effects for patients with psychiatric disorders including BD [58], although our meta-analysis did not show this effect. Our meta-analysis demonstrated that lithium outperformed placebo in all efficacy outcomes; however, it did not rank highly for the outcomes. Although lithium outperformed placebo regarding all-cause discontinuation, lithium increased discontinuation due to adverse events, and carried risks of extrapyramidal symptoms/use of anticholinergic agents, nausea, and diarrhea. However, given only 17 of 19 lithium studies included in our meta-analysis did not use enrichment designs, most patients assigned lithium included in our meta-analysis were not evaluated for efficacy, acceptability, tolerability, and safety of lithium prior to the assignment. However, sensitivity analysis of enrichment designs using the design-adjusted model demonstrated similar results to the unadjusted analysis. Accordingly, we concluded that lithium still had benefits for patients with BD in the maintenance phase, providing that due care is taken of its side effects.

A Finnish nationwide cohort of 18,018 patients with BD (mean follow-up time = 7.2 years) demonstrated that lithium and long-acting injectable (LAI) antipsychotics were effective in preventing hospitalization due to mental or physical illness compared with no drug use [59]. Unlike the results of our meta-analysis, the study indicated that lithium was superior to other mood stabilizers and that LAI antipsychotics are markedly better than identical oral formulations of antipsychotics. Quetiapine (most widely used in the study population) showed only an 8% risk reduction. Thus, there appear to be inconsistencies between the results of our meta-analysis, which included RCTs (providing the most robust evidence), and those of the cohort study (reflecting “real-world” routine clinical practice). We could not simply compare results between the studies for the following reasons [59, 60]. First, the study durations of RCTs are generally shorter than those of non-RCT studies. Second, the symptoms of trial populations are evaluated in more detail than those of patient populations in clinical practice. Hence, symptoms might be detected earlier, and earlier intervention given to trial populations than to patients in clinical practice. Third, because RCTs often have stringent inclusion and exclusion criteria (e.g., excluding patients with the most comorbidities and the highest severity of illness, such as suicidal ideation and suicidal attempt), trial populations are often not representative of those in clinical practice.

Our study has several limitations. First, the confidence in evidence of the first NMA was often low or very low. In the primary outcome, confidence levels were deemed to be low or very low in 90.8% of comparisons with placebo. Second, we did not perform the inconsistency test for dry mouth and prolactin-related adverse events for the first NMA and all outcomes for second NMA. Third, the range of study durations included in our meta-analysis was 17.3–171.4 weeks. Thus, the long-term efficacy and safety of drugs still need to be verified. Fourth, we did not cover important clinical issues that might inform treatment decision-making in routine clinical practice (e.g., combination with nonpharmacological treatments). Fifth, a cost-effectiveness analysis should be performed and included in the decision-making process.

In conclusion, our study represents the most comprehensive evidence currently available to guide the initial choice of pharmacological treatment for adult patients with BD in the maintenance phase. Clinicians and patients should consider the maintenance phase when selecting the treatment for the acute phase of BD.