Depressive disorders and bipolar disorders (BD) are the leading source of disability worldwide. They are also associated with psychosocial dysfunction, high societal costs [1,2,3] (e.g. 65% of people with BD are unemployed [4]), and premature mortality compared to the general population [5, 6], largely due to medical comorbidities, including diabetes [7], metabolic syndrome [8] and cardiovascular diseases [9, 10]. For instance BD has an estimated reduction in life expectancy of 12–20 years for men and 11–17 years for women compared to the general population [11]. Ultimately, poor physical health is a major public health concern for people with mood disorders [12], since 96.3% of people with BD have at least one co-occurring medical condition [13]. Many possible pathways contribute to poor physical health among patients with mood disorders, including genetic vulnerability [14], environmental risk factors such as economic disadvantage [15] and loneliness [16], unhealthy lifestyle and adverse treatment effects [17]. People with mood disorders engage in less physical activity [18] and have a poorer quality diet, with increased sugar, high fat and carbohydrate intake [19]. Smoking [20] and other substance use disorders [21] are highly co-morbid in this population. Additionally, despite the evidence for the efficacy antidepressants, mood stabilizers and/or antipsychotics in the treatment of mood disorders, these agents may also expose patients to a higher risk of common side effects, such as weight gain and metabolic syndrome [22,23,24].

Given the described alarming association between mental and medical disorders, increased attention is being paid to metabolic and physical adverse effects of psychotropic medications, both during acute [22, 23, 25] and long-term management of these disorders [23, 26, 27]. However, there is also evidence for beneficial effects of pharmacological and non-pharmacological interventions on physical health outcomes in people with severe mental illness, for example schizophrenia and dementia [28, 29]. However, to the best of our knowledge, such evidence synthesis is missing in the context of mood disorders; hence, we sought to aggregate the existing top­tier evidence from the most recent/largest published (network) meta-analyses [(N)MAs] of randomized controlled trials (RCTs) in people with mood disorders reporting on physical health outcomes and intolerability-related discontinuation, to determine the magnitude of efficacy of pharmacological and non­pharmacological interventions targeting physical health outcomes, also grading the quality of evidence which can inform on how much data from a given source can be trusted, in order to fill this gap.


A systematic review of (N)MAs of RCTs was conducted (eTable 1–2) [30], following a pre-defined protocol (link available in eMethods). Two independent authors searched MEDLINE/PubMed, PsycINFO, from their respective inception dates up to January 28th, 2022 without language restrictions, for (N)MAs of RCTs reporting on any physical health outcome among people with mood disorders (search string available in eMethods). Manual search of references lists of included meta-analyses was also conducted.

Inclusion criteria were operationalized according to PICOS (population, interventions, comparisons, outcomes and setting/study design). Included were (N)MAs of RCTs in depressive disorders or BD, confirmed according to DSM or ICD criteria, or validated scales with cut-off, reporting on any physical health outcome or intolerability-related discontinuation, including the following:

  • Any physical health markers, such as body weight, levels of glucose and lipid metabolism parameters, cardiovascular illness (e.g., myocardial infarction, stroke, TIA, pulmonary embolism, etc), respiratory illness (lung cancer, COPD, etc.).

  • Parameters of physical fitness: maximal or peak oxygen uptake, muscle strength, etc.

  • Any biomarkers investigated: Hba1c, c-reactive protein or other blood and serum markers.

  • Physical health related quality of life.

No restriction was made regarding age, or control group (e.g., active comparison, placebo, treatment as usual/usual care, waiting list, no treatment). Age groups are categorized as youth if < 18 years old, adults 18–64, elderly ≥ 65; if multiple age groups are present, we extracted both data for single age groups and/or for mixed age groups, whichever present.

For each MA we extracted author, year, population of interest, age group, intervention, control, outcome, and effect size data (with 95% confidence intervals, CI) for all relevant outcomes, as well as the number of RCTs and participants for each effect size. We also extracted measures of heterogeneity, as reported by authors, and publication bias. For NMAs, we included only outcomes where at least 1 direct comparison was available.

Methodological quality of the included meta-analyses was measured with “A Measurement Tool to Assess Systematic Reviews” (AMSTAR) (range 0­11, with a score of 8 or higher indicating high quality) [31], complemented with six additional items previously developed that also measure the quality of included RCTs (AMSTAR­Plus Content, range 0­8) [32]. For NMAs we modified AMSTAR’s item 9 into “Did authors mention transitivity assumption, and inconsistency?” and AMSTAR-Content’s item 5 into “Did the NMA neglect/violate transitivity assumption, and were results affected by inconsistency?”, maintaining the same scoring [23]. We categorized quality into three levels, low/medium/high (L/M/H): AMSTAR-PLUS score was considered low when < 4, medium 4–7, high > 7; AMSTAR-Content score was classified low if < 4, medium 4–6, high > 6 [23]. Overall quality was determined by the lower of the two scores, as done before [33]. All phases of screening, extraction, and quality assessment were performed by two authors independently (GC, MS, MO, MAG, LC, GV), and conflicts resolved with consensus (GC, MS).

We reported data as directly extracted from the published meta­analyses. If necessary (i.e., non-standardized effect size, fixed effects model despite large heterogeneity as per I2 > 50%) and whenever sufficient data were provided, we converted results to standardized outcomes with Comprehensive Meta­Analysis (CMA, version 2 - The quality scores (AMSTAR, AMSTAR-Plus-Content, AMSTAR-Plus Total score) and sample size were used in meta­regression analyses if at least 10 studies provided data.


Search results

Of 3 847 articles, 11 NMAs and 86 MAs were included (Fig. 1), reporting on 69 pharmacological (47 monotherapies, 22 combinations), six non-pharmacological (three monotherapies, three combinations), and three combinations of pharmacological/non-pharmacological interventions. Overall, 40 different physical health outcomes, 3 combinations of physical health outcomes, and two global tolerability outcomes (any adverse event, intolerability-related discontinuation) were investigated. Control interventions included placebo, wait-list, no treatment, usual care, active pharmacological, active non-pharmacological interventions (eTable 3). Publications excluded after full-text assessment, with reason for exclusion are reported in eTable4.

Fig. 1
figure 1

PRISMA flow chart.

The number of trials for a specific health outcome ranged from 2–65 (median = 5, interquartile range = 3–12). Mean participant age across meta-analyses was 41.6 years, and 41.8% were male. Altogether, 8.2% of meta-analyses included youth, 76.5% adults, 3.1% elderly, and 12.2% mixed age groups. Overall, 63.3% (N)MAs included depressive disorders (24.2% with comorbid medical conditions), 34.7% BD, and 2.0% both.

Dose of pharmacological interventions and frequency of non-pharmacological interventions were often not ascertainable and, when so, had wide ranges and varied across included (N)MAs. Mean trial duration was 15.1 weeks for pharmacological interventions, 21.1 weeks for non-pharmacological interventions, and 25.3 weeks for mixed pharmacological/non-pharmacological interventions. A mean trial duration longer than 12 weeks was present in 21.7%, 75%, and 83.3% of comparisons respectively. (eTable 5).

Quality assessment of the included meta-analyses and meta-regression analysis

AMSTAR/AMSTAR-Plus Content mean score was 7.8 ± 2.2/3.8 ± 1.7 in the whole sample, 7.8 ± 2.3/3.8 ± 1.7 in pharmacological intervention (N)MAs, 7.6 ± 1.2/2.1 ± 1.3 in non-pharmacological intervention (N)MAs, and 7.8 ± 1.9/3.8 ± 1.3 in mixed (N)MAs. Sixty-nine (70.4%) (N)MAs had an AMSTAR ≥ 8, four (4.1%) had the maximum score (11). Three (N)MAs had the maximum AMSTAR-Plus Content (8). Also, subdividing the (N)MAs by the target of the intervention, namely intentionally directed to influence physical health outcomes versus iatrogenic effects of medications, in 13 out of 24 (N)MAs (54.2%) of interventions targeting physical health outcomes, and in 29 out of 74 (N)MAs (39.2%) of interventions targeting iatrogenic medication effects, the mean AMSTAR Content score was ≤ 3.

Forty-five (N)MAs included only double-blind trials (45.9%), 21 (21.4%) had a sample size  < 500 in all outcomes, and 34 (34.7%) > 1 000 in all outcomes. A significant heterogeneity in all outcomes was present in 38 (38.8%) (N)MAs, and in no outcome in 37 (37.8%). Finally, publication bias in all outcomes was present in 73 (74.5%) (N)MAs, and in no outcome in 14 (14.3%).

Meta-regression analysis was possible only for pharmacological interventions, compared with placebo or other active interventions (Table 1). In adult patients, compared to active controls, AMSTAR methodology and Total scores negatively moderated effect sizes regarding discontinuation due to adverse events (beta = −0.09/−0.05, p = 0.0004/0.048) and weight gain (beta = −0.16/−0.14, p = 0.01/0.03); in mixed adults and elderly population, these same variables positively moderated effect sizes for any adverse event (beta = 0.04/0.03, p = 0.001/0.004). Finally, a statistically significant, but negligible, moderating influence on effect sizes emerged in adult patients compared to active controls regarding intolerability-related discontinuation for sample size (beta = 0.0001, p = 0.047). No significant moderating effect emerged for youth and for comparisons with placebo.

Table 1 Meta-regression results.

Physical health outcomes of pharmacological and non-pharmacological interventions

Detailed results are reported in Tables 2, 3. For each outcome, below we summarize key findings for age groups separately, and accounting for control group. Additionally, we report separately data for interventions directly targeting physical health, and for iatrogenic effects of pharmacological interventions. In BD, “any” phase (or without specification of a phase) means that authors of (N)MA didn’t account for different phases of the disease in the included samples; otherwise, the specific phase considered is reported.

Table 2 Efficacy of interventions to improve physical health in subjects with mood disorders, compared to Placebo/TAU/Wait list/No treatment.
Table 3 Efficacy of interventions to improve physical health in subjects with mood disorders compared to active or mixed controls.

Interventions directly targeting physical health

Physical disease-related outcomes, fitness and quality of life


Compared to treatment as usual (TAU) or placebo, collaborative care treatment yielded fewer major adverse cardiac events in adults with acute coronary syndrome (ACS) and depression (ES = small, AMSTAR/Content = 9/2). SSRIs reduced readmissions for coronary heart disease (CHD) in those with CHD and depression (ES = small, AMSTAR/Content=6/4). In the elderly, SSRIs did not impact exercise tolerance and forced expiratory volume during the first minute (FEV1 – eResults) in chronic obstructive pulmonary disease (COPD) and depression, nor stroke recurrence in post-stroke depression. In adults with mixed chronic medical conditions and depression, collaborative care ensured more somatic diagnostic or treatment procedures than TAU (ES = small, AMSTAR/Content = 9/3).

Versus TAU, physical exercise improved cardiorespiratory fitness in people with depression (VO2 max or peak, ES = moderate, AMSTAR/Content = 8/2).

A benefit in physical health-related quality of life (PHQoL) emerged against TAU/wait-list/no treatment for mixed psychological interventions in adults with depression (ES = small, AMSTAR/Content = 6/0), mixed psychological/pharmacological interventions in adults with ACS and depression (ES = small, AMSTAR/Content = 5/5), and physical exercise in adults and elderly with depression (ES = moderate, AMSTAR/Content = 7/1). No difference was found for SSRIs versus placebo in adults with mixed chronic medical conditions and depression, and in elderly with COPD and depression.

Glucose metabolism


In adults with depression and type 1/2 diabetes mellitus (T1/2DM), a decrease in fasting glucose was observed with cognitive behavioural therapy (CBT) versus TAU (ES = moderate, AMSTAR/Content = 7/1) and SSRIs (but not paroxetine) versus placebo (ES = small, AMSTAR/Content = 9/3). A significant decrease in HbA1c compared to TAU/wait-list/placebo emerged for collaborative care (ES = small, AMSTAR/Content = 9/3), mixed psychological treatments (ES = moderate, AMSTAR/Content=9/2), mixed pharmacological treatments (ES = large, AMSTAR/Content = 9/1), or any of these interventions (ES = small, AMSTAR/Content = 9/4). CBT alone did not replicate this result.

When only T2DM was included, versus TAU, mixed psychosocial interventions significantly reduced both fasting glucose and 2 h postprandial glucose levels (both large ES, AMSTAR/Content = 7/3). Psychological interventions reduced HbA1c (ES = large, AMSTAR/Content=7/2), whilst collaborative care did not.



Insulin levels were not modified by quetiapine immediate-release (IR)/extended-release (XR) versus placebo in BD depression, nor by antipsychotic augmentation of mood stabilizers (MS) in BD.

In adults with depression, neither physical exercise versus TAU, or mixed psychological interventions versus inactive/active treatments modified cortisol levels.

No data on other hormones was found.



In adults with depression, versus placebo, SSRIs (ES = small, AMSTAR/Content=5/6), serotonin noradrenaline reuptake inhibitors (SNRIs) (ES = small, AMSTAR/Content=5/7) duloxetine (ES = small, AMSTAR/Content=1/5) and paroxetine (MD = −5.8 on VAS scale, AMSTAR/Content=8/2) reduced pain. No difference emerged comparing duloxetine to paroxetine/fluoxetine, or comparing paroxetine to sertraline/reboxetine.

Collaborative care versus TAU proved beneficial on a composite measure of pain and physical functioning in people with comorbid depression and arthritis or cancer (both small ES, AMSTAR/Content = 9/3, 9/2).

Iatrogenic effects of pharmacological interventions

Body weight and body mass index


In youth with bipolar depression, versus placebo, olanzapine+fluoxetine yielded significant weight gain (ES = large, AMSTAR/Content=8/4).

In youth and adults with BD, aripiprazole and lithium showed no difference in weight gain (WG) compared to placebo; lithium had also a better profile compared to other drugs (ES = small, AMSTAR/Content=8/3). In the depressive phase of BD, versus placebo, quetiapine induced WG (ES = small, AMSTAR/Content = 8/4).


In adults with BD, versus placebo, WG emerged for second-generation antipsychotics (SGA) combined (ES = moderate, AMSTAR/Content=5/5), also in LAI formulation (ES = small, AMSTAR/Content = 9/4). Versus active drugs, WG emerged for SGAs versus other antipsychotics/MS/combination of antipsychotic with antidepressant (ES = moderate, AMSTAR/Content=5/5), olanzapine versus lithium (ES = small, AMSTAR/Content = 7/5), antipsychotics augmenting MS (especially olanzapine and quetiapine), olanzapine, quetiapine (all moderate ES, AMSTAR/Content = 7/5, 7/3, 7/4).

In the manic phase of BD, versus placebo, asenapine induced WG (NNH = 19, AMSTAR/Content=4/4), as did olanzapine (ES = moderate, AMSTAR/Content=8/6), risperidone (ES = small, AMSTAR/Content = 8/5), SGAs (ES = small, AMSTAR/Content=8/3), ziprasidone (NNH = 21, AMSTAR/Content=4/3), and valproate (NNH = 30, AMSTAR/Content=4/4). No difference emerged for aripiprazole, cariprazine, haloperidol alone or as augmentation to lithium or valproate, or for paliperidone. Versus olanzapine, asenapine induced less waist circumference increase and WG (MD = −0.34/−0.40, AMSTAR/Content = 7/5), and valproate less WG (ES = small, AMSTAR/Content = 11/4).

In the depressive phase of BD, versus placebo, WG emerged for olanzapine (ES = large, AMSTAR/Content=10/3), cariprazine (ES = moderate, AMSTAR/Content=10/6), quetiapine XR (ES = moderate, AMSTAR/Content=9/4) and IR (ES = small, AMSTAR/Content = 9/4), SGA (ES = small, AMSTAR/Content = 7/6). Lurasidone and aripiprazole did not affect weight. Modafinil and anti-ADHD medications protected against WG (both small ES, AMSTAR/Content = 9/3, 7/7).

In adults with depression, versus placebo, WG emerged for aripiprazole (ES = moderate, AMSTAR/Content = 9/2), brexpiprazole (ES = moderate, AMSTAR/Content=7/6), olanzapine (ES = large, AMSTAR/Content = 9/2) and quetiapine (ES = small, AMSTAR/Content = 9/2). In head-to-head trials, fluoxetine showed less WG than tricyclic antidepressants (TCAs) (NNT = 39, AMSTAR/Content = 3/2), SSRIs (NNT = 23, AMSTAR/Content = 3/2), amitriptyline (NNT = 25, AMSTAR/Content = 3/2), doxepin (NNT = 17, AMSTAR/Content=3/2), imipramine (NNT = 40, AMSTAR/Content=3/2) and paroxetine (NNT = 15, AMSTAR/Content=3/2). Paroxetine was less likely to lead to WG than maprotiline (ES = large, AMSTAR/Content = 10/3) and mirtazapine (ES = moderate, AMSTAR/Content = 10/3), but more than reboxetine (ES = moderate, AMSTAR/Content = 10/4). Mirtazapine caused more WG than SSRI (ES = moderate, AMSTAR/Content = 4/3).

When both adults and elderly patients were included, augmentation of antidepressants with brexpiprazole led to significant WG (ES = small, AMSTAR/Content=10/5), as did amisulpride compared to fluoxetine (ES = moderate, AMSTAR/Content = 10/2).

In adults with unipolar/bipolar depression, compared to placebo, lurasidone caused weight gain (ES = moderate, AMSTAR/Content = 11/6).


No (N)MA included elderly patients only.

Cardiovascular and respiratory system


No data on youth with BD was found. In youth with depression, versus placebo, SSRIs and TCAs did not affect respiratory symptoms, without effect of SSRIs on postural hypotension. SSRIs lowered systolic blood pressure (SBP) and diastolic blood pressure (DBP) (both small ES, AMSTAR/Content = 8/8, 8/7). Paroxetine lowered SBP, fluoxetine lowered DBP (both small ES, AMSTAR/Content = 8/7, 8/6). Versus SNRIs, SSRIs lowered SBP and DBP (both small ES, AMSTAR/Content = 8/7), results confirmed when including also adult patients (SBP and DBP small ES, AMSTAR/Content = 8/8, 8/7).


In adults with BD, antipsychotic augmentation of MS yielded no difference on EKG abnormalities, QTc change, or orthostatic hypotension.

In adults with depression, versus placebo, levomilnacipran increased hypertension (NNH = 75, AMSTAR/Content = 2/5) and tachycardia (NNH = 25, AMSTAR/Content=2/5), amitriptyline increased tachycardia (ES = moderate, AMSTAR/Content = 9/2), while imipramine caused more palpitations (ES = small, AMSTAR/Content = 10/3). Versus SNRIs, SSRIs decreased blood pressure (SBP and DBP small ES, AMSTAR/Content = 8/8). Combined hypertension/tachycardia was more frequent with amitriptyline than mirtazapine (ES = small, AMSTAR/Content=10/3) and with milnacipran than fluvoxamine (ES = small, AMSTAR/Content = 8/1); fluvoxamine reduced hypotension/bradycardia versus TCAs (ES = small, AMSTAR/Content = 9/3) and imipramine (ES = moderate, AMSTAR/Content = 9/4). Versus paroxetine, reboxetine yielded less hypotension (ES = small, AMSTAR/Content=10/5) but more dyspnea (ES = moderate, AMSTAR/Content=10/4).


In elderly patients with depression, fluoxetine yielded more (unclearly defined) cardiovascular reactions than escitalopram and sertraline (both large ES, AMSTAR/Content=8/4), but did not differ from citalopram/paroxetine.

Glucose metabolism


In youth with bipolar depression, lurasidone, olanzapine+fluoxetine and quetiapine IR/XR were neutral. In youth and adults with BD, versus placebo, aripiprazole significantly decreased fasting glucose (ES = small, AMSTAR/Content=10/2).


In adults, versus placebo, a significant increase in fasting glucose emerged for asenapine during mania (MD = 0.20, AMSTAR/Content = 7/5), but not in BD depression for aripiprazole, cariprazine, lurasidone, olanzapine and quetiapine IR/XR. Olanzapine increased HbA1c in adults when pooling data from all phases of BD (NNH = 69, AMSTAR/Content = 2/3) but not when restricting analyses to BD depression only. In BD, HbA1c was not modified by asenapine, lurasidone, and quetiapine IR/XR. Versus active drugs, antipsychotic augmentation of MS increased fasting glucose and HbA1c (both small ES, AMSTAR/Content = 7/5, 7/4).

Lipid profile


In youth with bipolar depression, versus placebo, olanzapine+fluoxetine increased total cholesterol and triglycerides (MD = 20.5/38.6, AMSTAR/Content=8/4), quetiapine IR/XR triglycerides (MD = 34.9, AMSTAR/Content=8/3), while lurasidone decreased LDL cholesterol and triglycerides (MD = −5.90/−13.4, AMSTAR/Content=8/4). In youth and adults with BD, versus placebo, aripiprazole significantly decreased total cholesterol (ES = small, AMSTAR/Content=10/2), without altering high-density lipoprotein (HDL) /triglycerides levels.


In adults with BD, versus placebo, no antipsychotic modified total and low-density lipoprotein (LDL) cholesterol or triglycerides. A significant increase in triglycerides emerged for antipsychotic augmentation of MS (ES = small, AMSTAR/Content=7/5), without altering total cholesterol/HDL/LDL.

In BD depression, versus placebo, olanzapine increased total cholesterol (MD = 7.06, AMSTAR/Content=10/6). Lurasidone and quetiapine IR/ XR were neutral on lipid profile.

No data were available for people with unipolar depression.

Liver enzymes


In manic or mixed phase of BD, olanzapine significantly increased liver enzymes compared to placebo (ES = large, AMSTAR/Content=8/4).

Discontinuation due to adverse events, any adverse event


In youth with bipolar depression, placebo was more tolerated than olanzapine+fluoxetine, but less than quetiapine IR/XR (ES = small, AMSTAR/Content = 8/4, 8/3). In youth and adults with BD, compared to placebo, aripiprazole was less tolerated in any phase, valproate in manic phase, quetiapine in depressive phase (all small ES, AMSTAR/Content = 10/4, 8/4, 8/4).

In youth with depression, more discontinuation than placebo emerged for SNRIs, SSRI/SNRI (both small ES, AMSTAR/Content = 10/6) and duloxetine, which was also less tolerated than fluoxetine (both small ES, AMSTAR/Content=11/8). No difference emerged for SSRIs, both grouped and individually.

Considering any adverse event, in youth with depression, compared to placebo, SSRIs grouped and paroxetine were less tolerated (ES = small, AMSTAR/Content = 10/3), but not other SSRIs, while in adolescents less tolerability emerged for any antidepressant (ES = small, AMSTAR/Content = 10/5). In youth and adults in depressive phase of BD, compared to placebo, quetiapine yielded more adverse events (ES = small, AMSTAR/Content = 8/4).


In adults with BD, lithium was less tolerated than placebo (ES = small, AMSTAR/Content=5/6). More intolerability-related discontinuation was observed also with long-acting injectable antipsychotics (ES = small, AMSTAR/Content=9/4), while better tolerability emerged for asenapine (ES = small, AMSTAR/Content=5/6). In head-to-head comparisons, lithium was less tolerated than lamotrigine (ES = moderate, AMSTAR/Content = 5/6), quetiapine (ES = small, AMSTAR/Content=5/6) and when augmenting valproate (ES = small, AMSTAR/Content = 5/6).

In the manic phase of BD, compared to placebo, worse tolerability emerged for cariprazine, carbamazepine, ziprasidone (all small ES, AMSTAR/Content = 7/8, 8/2, 8/3) and valproate (NNH = 25, AMSTAR/Content = 4/4), and in the depressive phase for SGAs, quetiapine IR/XR, aripiprazole (all small ES, AMSTAR/Content = 7/6, 10/7, 10/5) and lamotrigine (NNH = 27, AMSTAR/Content = 4/5).

In adults with depression, compared to placebo, intolerability-related discontinuation was greater with amitriptyline (ES = moderate, AMSTAR/Content=9/5), aripiprazole, brexpiprazole, quetiapine XR, fluoxetine and paroxetine (all small ES, AMSTAR/Content=8/5, 7/6, 8/5, 3/6, 8/5). In head-to-head comparisons, agomelatine was better tolerated than SSRI and venlafaxine (both small ES, AMSTAR/Content=10/4, 10/3), and fluoxetine than paroxetine (ES = small, AMSTAR/Content=10/6), while TCAs were less tolerated than citalopram, fluoxetine, and paroxetine (all small ES, AMSTAR/Content=10/4, 10/4, 10/6). Finally, augmentation of antidepressant treatment with brexpiprazole compared to placebo yielded more intolerability-related discontinuation both in adults (NNH = 54, AMSTAR/Content = 2/5) and adults+elderly (small ES, AMSTAR/Content = 10/5).

When combining adults and elderly with depression, compared to placebo, higher discontinuation emerged for TCA (ES = moderate, AMSTAR/Content = 10/6), MAO-I, SSRI (both small ES, AMSTAR/Content = 10/6, 10/6); TCA were also less tolerated than SSRI and antipsychotics (both small ES, AMSTAR/Content = 10/6, 10/5), while mirtazapine was less tolerated than sertraline (ES = small, AMSTAR/Content = 10/4). In post-stroke depression, worse tolerability emerged for doxepin compared both to placebo and paroxetine (both large ES, AMSTAR/Content = 8/2, 8/2).

Considering any adverse event, in adults with BD higher rates were observed with augmentation of MS with any antipsychotic or ziprasidone (both small ES, AMSTAR/Content = 7/6, 7/4), but not risperidone. In manic phase of BD, this was observed again with augmentation of MS with any antipsychotic, and with valproate compared to placebo (both small ES, AMSTAR/Content = 9/5, 11/4).

In adults with depression, compared to placebo more adverse events were observed with dextroamphetamine, amitriptyline (both moderate ES, AMSTAR/Content = 8/4, 9/2), aripiprazole, brexpiprazole, bupropion, fluoxetine, lisdexamphetamine, paroxetine, venlafaxine, vortioxetine (all small ES, AMSTAR/Content = 10/7, 10/7, 8/3, 8/3, 8/5, 8/5, 8/5, 8/3), but not mirtazapine. In head-to-head comparisons, fluvoxamine and paroxetine (both small ES, AMSTAR/Content = 9/3, 10/6) had less adverse events than TCAs, and agomelatine less than SSRI and paroxetine (both small ES, AMSTAR/Content = 10/4, 10/3). Furthermore, citalopram, while being better tolerated than amitriptyline, was less tolerated than imipramine (both small ES, AMSTAR/Content = 10/3).

In both unipolar and bipolar depression, lamotrigine was equally tolerated as placebo or other active drugs.

In adults and elderly with depression, SNRI yielded more adverse events than placebo (ES = small, AMSTAR/Content=10/6), as TCA did compared to placebo, SSRI and antipsychotics (all small ES, AMSTAR/Content = 10/6, 10/5, 10/4).


In elderly with depression, higher discontinuation emerged for SSRI and SNRI compared to placebo, and for TCA and amitriptyline compared to SSRI (all small ES, AMSTAR/Content = 9/5, 9/5, 9/5, 9/3).


To our knowledge, this umbrella review of meta-analyses is the first to systematically and quantitatively report on effects of pharmacological and non-pharmacological interventions on physical health outcomes in people with mood disorders. Along with presenting the available meta-analytic findings, this review also sheds new light on those areas where top tier evidence is currently lacking. Therefore, these findings should help guide current clinical practice, while also identifying where future research should focus.

Overall, compared to placebo, out of 333 associations, 205 (61.6%) were neutral, 93 (27.9%) were worse, and 35 (10.5%) were better. Against active comparison, out of 372 comparisons, 265 (71.2%) were neutral. For the 235 significant effect sizes, the magnitude was small in 77.0%, moderate in 16.2%, and large in 6.8%.

Regarding non-pharmacological interventions, all were delivered with the purpose to ameliorate physical health outcomes in people with depression. When compared to TAU/wait-list/placebo, psychosocial interventions had small to moderate effect sizes in improving diabetes, namely glycated hemoglobin, fasting and post-prandial glycaemia. CBT for diabetic patients with comorbid depression had also a moderate effect, but only on fasting glucose. Exercise, on the other hand, was moderately efficacious in ameliorating cardiorespiratory fitness in people with depression, which is associated with risk for cardiovascular and all-cause premature mortality in the general population [34], with mental and PHQoL among people with BD [35], with risk of developing depression [36], and recurrent depressive episodes [37]. Finally, both psychosocial interventions and physical exercise led to improved PHQoL, with small or moderate effect sizes. No clear data were found for head-to-head comparisons between non-pharmacological and pharmacological interventions, since in the few available studies, intervention and control conditions were of mixed nature, making a reliable comparison between the arms unfeasible. No data was found regarding the included acceptability/tolerability outcomes.

With regards to pharmacological interventions, across 49 pharmacological strategies, 28 differed significantly from the control condition on various physical health outcomes. Only antidepressants were assessed for a direct beneficial effect, and only in people with unipolar depression, with or without a comorbid medical condition. Compared to placebo, SSRIs were protective against CHD readmission with a small effect. Considering metabolic outcomes, glycemic control (fasting glucose and HbA1c) in people with comorbid depression and diabetes was ameliorated by SSRIs versus placebo with a small effect, while a large effect on HbA1c was observed with pooled pharmacological interventions for depression when the control group included also wait-list and TAU. No data on lipids were available. For pain relief, compared to placebo, both SSRIs and SNRIs, in particular duloxetine and paroxetine, had a small beneficial effect, without differences in head-to-head comparisons.

Considering iatrogenic effects of medications, data on antipsychotics derived largely from (N)MAs in patients with BD, in any, manic or depressive phase, and considered mainly glucose, lipids and weight-related parameters. Compared to placebo, olanzapine showed the worst profile, followed by asenapine, and quetiapine. Our findings regarding glucose metabolism for treatment with quetiapine seem to contradict some literature [38] on different disorders and drug label; results of this work must be interpreted also accounting for evidence on the same molecule in other disorders. Interestingly, aripiprazole led to WG only in unipolar depression. While this finding could reflect moderation by diagnosis, which has not been directly tested or demonstrated, it could also reflect an order effect, i.e., the higher likelihood of prior exposure to WG-inducing medications, such as other antipsychotics or mood stabilizers in BD versus depressive disorders, attenuating further WG during the subsequent exposure to aripiprazole. In this context of mainly non-antipsychotic-naïve patients, aripiprazole seems to also improve fasting glucose and total cholesterol, each with a small effect. These results should however be considered cautiously since they rely on short-term data, while longer-term data in the same MA showed no significant differences with aripiprazole from placebo. Moreover, notably, in adults with bipolar depression lurasidone appeared to be neutral for all extracted glucose- and lipid-related outcomes, and even advantageous in youth. The observed WG with lurasidone in unipolar/bipolar depression requires caution in interpretation: in that MA this refers to lower doses, while at higher doses the effect was neutral. In head-to-head comparisons, worsening of glucose metabolism (but not lipids) and WG emerged for antipsychotic augmentation of MS. Regarding tolerability, compared to placebo, antipsychotics, both grouped and individual, showed a higher rate of intolerability-related discontinuation. Comparing different pharmacological interventions, more frequent adverse events and related treatment discontinuation was observed with antipsychotic augmentation of MS, ziprasidone in particular. Notably, data on cardiovascular safety were not widely available.

Considering mood stabilizers, again assessed almost only in BD, both lithium and valproate did not lead to significant WG compared to placebo, and lithium did not also when compared to other MS and SGA, without other data on metabolic or cardiovascular outcomes for these and other MS. Tolerability was worse with lithium, valproate and lamotrigine in BD (in any, manic, or depressive phase), while lamotrigine did not differ from placebo when the population included both unipolar and bipolar depression.

Antidepressants were assessed in people with unipolar depression, with or without a comorbid medical condition. Compared to placebo, antidepressants had a very small lowering effect on both SBP and DBP; among individual compounds, this was observed with fluoxetine for DBP, and with paroxetine for SBP. In direct drug comparisons, SSRIs had a lower hypertensive effect than SNRIs (small) and imipramine (moderate). For general cardiovascular health, in elderly patients, fluoxetine showed a very large worsening effect compared to sertraline and escitalopram. A small effect on weight gain emerged for augmentation of antidepressants with brexpiprazole, while in direct comparisons paroxetine was worse than reboxetine but better than maprotiline and mirtazapine, with a moderate to large effect. A moderate effect for weight gain emerged for mirtazapine also compared to SSRI. Considering tolerability, in depression, in general all pharmacological classes were less tolerated than placebo; SSRIs were better tolerated than TCAs, while no significant differences emerged comparing SSRIs-SSRIs and SSRIs-SNRIs, with the exception of duloxetine for higher intolerability-related discontinuation rates. ES magnitude ranged from small to moderate.

No data emerged regarding possible advantages/caveats of treatments for mood disorders for COVID-19-related diseases. The need for more RCTs on this topic, considering the potential disadvantages that mood disorders may cause, has been highlighted by a recent review [39].

The AMSTAR methodology score of included (N)MAs was overall high, while quality of included RCTs (AMSTAR Content score) was more variable and rarely high. High overall quality scores pertained to outcomes regarding pharmacological interventions in youth with unipolar depression, in particular blood pressure and overall tolerability. Data on cardiovascular outcomes, lipids, and pain, quasi exclusively of pharmacological nature and in adults, was supported by mainly medium overall quality scores; this was true also for WG and tolerability outcomes in BD patients, while in patients with depression data had more low-quality comparisons. This low-quality assessment refers also to interventions targeting direct improvement of physical health outcomes. Glucose metabolism-related outcomes showed a higher frequency of low-quality comparisons, both for pharmacological and non-pharmacological interventions, with the exceptions of collaborative care and antipsychotic augmentation of MS, which were supported by medium quality. Also, cardiorespiratory fitness outcomes were characterized by low overall quality.

Meta-regression analyses were possible for very few outcomes, namely any adverse event, intolerability-related discontinuation and WG, and only for pharmacological interventions. With an active comparator, AMSTAR methodology and Total scores showed a negative moderating effect on intolerability-related discontinuation and weight gain, thus indicating smaller between-drug differences in higher-quality (N)MAs. Higher quality also contributed to magnify between-drugs adverse events frequency.

Based on the results of this comprehensive umbrella review, psychosocial interventions showed the most beneficial effects on diabetes-related parameters. Conversely, treatment with antipsychotics (SGAs in particular) had the highest risk profile for worsening glucose metabolism, lipids profile and of inducing WG. Exercise seems to be important in improving cardiorespiratory fitness and physical health-related quality of life. Related to co-morbid pain, SNRIs and SSRIs have small beneficial effects, they can also reduce readmission rates when CHD is present. In elderly patients, sertraline and escitalopram should be preferred over fluoxetine. In a recent meta-review on people with schizophrenia [28], non-pharmacological interventions (e.g., diet/lifestyle-oriented, CBT) were also beneficial on a range of auxological and metabolic outcomes, as did modification of previous pharmacotherapy with olanzapine or quetiapine. However, in that population evidence of efficacy emerged also for non-psychotropic medications, such as metformin and topiramate, for which no data was found in our review, leaving unanswered the question of a possible maintained or differential effect of those drugs also in mood disorders population.

NICE guidelines for depression [40, 41] and EPA guidelines [42, 43] cite both pharmacological (SSRI) and non-pharmacological (physical exercise, CBT) interventions as first-line treatments, while collaborative care (which is a structured psychosocial intervention, delivered by both primary care physicians and mental health professionals, that comprises case management, close collaboration between primary and secondary physical health services and specialist mental health services in the delivery of services, the provision of a range of evidence-based interventions, and the long term coordination of care and follow-up) [40, 41] is reserved for non-responders. Anyway, physical health-related harms and benefits are not clearly considered. CANMAT guidelines for depression [44,45,46] recognize a role for psychotherapy, especially CBT, to improve adherence to medical interventions. Our findings further support offering a psychosocial intervention early in treatment, especially in people with comorbid diabetes, to improve physical health and medical comorbidity management, and to prefer SSRI as pharmacological choice. Also, from our results CBT alone improved fasting glucose, but not HbA1c, while collaborative care did. The benefit of physical exercise on cardiorespiratory fitness is also in line with EPA guidelines [47]. Considering safety of pharmacological interventions, both NICE and CANMAT guidelines suggest as antidepressants augmentation strategies lithium, aripiprazole, olanzapine, quetiapine and risperidone; our review can contribute to a better risk/benefit-based choice.

Considering bipolar disorder, both NICE [48] and CANMAT [49] guidelines suggest offering a psychotherapy in bipolar depression and/or in maintenance phases, but focusing only on mental health-related outcomes, and unfortunately in our review no new data emerged regarding possible physical health benefits. Our results on iatrogenic effects of medications are instead largely supported by CANMAT guidelines, emphasizing high safety concerns for augmentation of a MS with an AP, and with olanzapine having the worst profile on weight and metabolic syndrome. Lurasidone is recognized as having no to little safety concerns in monotherapy, while having more issues in combination with MS. Interestingly, asenapine is given only some risk for WG in long-term use, while our results denoted detrimental metabolic effects also early in treatment.

Taken together, our data offer clinicians perspectives on the potential best evidence-based methods to address specific physical health issues in people with affective disorders, or at least to prevent poor physical health by choosing safer medications. In patients with affective disorders and diabetes, clinicians should consider both pharmacotherapy and psychosocial interventions such as collaborative care. CBT seems less promising, as it ameliorates fasting glucose but not HbA1c. Physical exercise should also be considered due to its beneficial effects on cardiorespiratory fitness in this population, along with the broader benefits for physical and mental health established elsewhere [50]. Clinicians should also keep in mind the potentially harming effects of SGAs, in particular olanzapine, thus preferring other drugs when a comorbidity is present and, in any case, carefully monitoring metabolic blood parameters and weight. Painful symptoms in patients with depression can benefit from treatment with SSRIs or SNRIs. Furthermore, in patients suffering from affective disorders and CHD, SSRIs may be preferred to other classes of antidepressants, with a careful choice of the single molecule, in particular in elderly patients.

We acknowledge the body of evidence we have summarized in this umbrella review is broad and heterogeneous. We have a-priori planned to account for such heterogeneity, by not combining different population/intervention/control/outcome combination. We would point out that, in virtue of such a-priori approach that does not mix apples and oranges, providing a one-stop-shop synopsis of such a broad body of evidence can be a strength of this umbrella review. Nevertheless, this umbrella review has some limitations. First, although the included meta-analyses were the most updated and/or largest for each specific intervention and outcome, this approach might have led to the exclusion of higher quality MAs with lower sample sizes/number of included studies. Second, interventions tested in individual RCTs for which no (N)MA existed were not included. Third, due to limited data for participant characteristics and interventional designs, conducting meta-regression analyses was possible for a minority of a priori considered outcomes. Fourth, while the overall quality of the methods of eligible (N)MAs was generally good, the content of the meta-analyzed studies often had low quality; furthermore, AMSTAR-PLUS did not undergo formal quantitative validation (eDiscussion). Fifth, the time-points for effect size measures were not extracted, so there is no account of possible differences in short-term versus long-term data of both beneficial and disadvantageous interventions (yet, at least for pharmacological interventions which provided the majority of data, most evidence comes from endpoint assessments of short-term RCTs). Moreover, a range rather than absolute values of dosages of included pharmacological interventions was frequently reported, which also usually spanned from lower to higher doses, thus preventing evaluation of possible more granular differences. Nevertheless, despite these limitations, this is to the best of our knowledge the first umbrella review of pharmacologic and non-pharmacologic interventions for physical health outcomes in patients with affective disorders. Strengths of this study include its comprehensiveness, the assessment of the methodological quality of meta-analyses and the meta-analyzed RCTs with a validated tool, and the provision of a systematic synthesis of the available evidence from meta-analyses of RCTs in this unmet need in the management of people with mood disorders. In conclusion, despite the high risk for physical comorbidities in people with affective disorders and their impact on individuals and the health system, the existing evidence for effective pharmacological and non-pharmacological interventions to prevent and treat these conditions is still limited. Sufficiently large and qualitatively excellent individual RCTs are therefore necessary. In addition, the field should move from study-level to patient level meta-analyses, as this would provide a more personalized picture of treatment effects for individuals, derived for adequately powered subgroup analyses. Comparing pharmacological and non-pharmacological interventions in the same trial would also be desirable, and there is a need for large-scale investigations of combinations of pharmacological and non-pharmacological regimes, as well as preventive interventions, which aim to prevent physical comorbidities even before their onset.