Bipolar disorder (BD) is a recurrent, severe illness characterized by episodes of mania (or hypomania) and depression, and affects ~1–3% of the population.1 The disease is highly heritable2 but the underlying pathophysiology is not yet understood. Although structural brain abnormalities in BD have been reported, the pattern of structural brain abnormalities based on magnetic resonance imaging (MRI) is still not clearly defined. A set of retrospective meta-analyses of structural MRI studies of BD and healthy controls found right lateral ventricular enlargement in patients as the only consistent volumetric difference based on previously published studies.3, 4, 5 However, several studies report detectable differences in the dorsolateral prefrontal cortex,6 the ventrolateral prefrontal cortex,7 and the anterior and subgenual cingulate cortices.8 Meta-analytic studies highlight the substantial heterogeneity across studies for several structures of interest, notably the amygdala and thalamus.3 Previously, a mega-analysis combined data from multiple sites around the world into a single model, and found differences in lateral ventricle, temporal lobe and putamen volume.9

The sources of the heterogeneity in previously reported findings are likely to be multifactorial and complex. First, there may be true biological variability across cohorts, that may derive from clinical phenotypes such as disease severity and duration, medication status and history, and co-morbidity. One much debated source of bias in the study of brain volumetric associations with BD is the effect of mood-stabilizing medications, primarily lithium, which may influence brain structure in individual studies,10 meta-analyses,11 as well as mega-analyses.9 However, it is difficult to distinguish unique effects of pharmacological agents on brain volumes from concurrent effects of clinical and demographic variables, which are likely to interact with medication status, and may even influence which medications are prescribed. Second, variability in the neuroimaging data acquisition, processing and analysis protocols can affect the sensitivity and apparent variability of the brain measures, making it challenging to compare different studies directly.

To address several of these issues, we formed an international collaboration for the study of BD as part of the Enhancing Neuroimaging Genetics through Meta-Analysis (ENIGMA) Consortium.12 Here, we aimed to identify subcortical brain volumetric changes13, 14 that may consistently distinguish BD patients from healthy controls using a coordinated meta-analysis approach that builds on the work from Hallahan et al.9 A number of the studies included in this ENIGMA project have examined volumetric brain differences previously (see Supplementary Table 2 citations), but this effort combines many new sites and additional data in a coordinated analysis. We also aimed to examine and characterize sources of heterogeneity in brain imaging volumetric indices using exploratory analyses based on BD subtype (I or II), age at illness onset, commonly prescribed medications, and mood state. In the present study, we focused on subcortical structures for three primary reasons: (1) they are critically involved in emotional response and memory, hallmark behavioral features of BD;15, 16, 17 (2) they are reliably measured across sites, as detailed in extensive comparisons of multi-site/scanner analyses (see Supplementary Tables 5 and 6a and h);18, 19 and (3) are key components in dysregulation network connections which interplays with the cortex.20

Materials and methods


Samples contributing to this project and demographics by site are given in Supplementary Table 1. Participating sites were contacted to collaborate on this project through the ENIGMA Consortium, after advertising the goals of the project on the ENIGMA website and at numerous international conferences in neuroimaging and psychiatry; none of the sites contacted refused or were unable to participate in this project. In total, data from 4304 subjects including 1710 cases and 2594 healthy controls were available for analysis. Each participating site obtained approval from an institutional review board or local ethics committee, and all study participants signed informed consent documents at their local institution.

Image acquisition and segmentation

High-resolution T1-weighted MRI brain scans were acquired at each of the 20 participating sites. Detailed information on scanner sequence and acquisition parameters are found in Supplementary Table 2. A description of image and volume segmentation quality control is given in Supplementary Note 1.

Specification of statistical models and coordinated analysis across sites

The present study involved coordinated analysis of brain structural MRI scans. Our primary focus was on the mean volumetric differences between patients with BD and healthy controls in seven subcortical brain structures: nucleus accumbens, caudate, globus pallidus, putamen, amygdala, hippocampus and thalamus. We also assessed lateral ventricular volume and total intracranial volume (ICV). Within each sample, we used multiple linear regression to quantify the differences between BD patients and healthy controls, whereas accounting for age, sex and differences in head size (ICV) as covariates. For each structure, we calculated effect size estimates using Cohen’s d, adjusted for age, sex and ICV, using the t-statistic from the diagnosis predictor variable (coded as patients=1; controls=0).21 As an extension of this model, we examined age-by-diagnosis and sex-by-diagnosis interactions using the t-statistic from the interaction predictor variable (while leaving age, sex and diagnosis predictors in the model) to calculate an adjusted Cohen’s d effect size estimate. A description of the meta-analytical framework and reported statistics is given in Supplementary Note 2.

Analysis of differences in diagnosis subtype, mood state, age of illness onset and medications

To investigate brain differences associated with the clinical phenotype we examined whether there were detectable differences between BD subtypes, focusing on patients diagnosed with BDI and BDII. Methods for determining subtype at each site are given in Supplementary Table 3. We performed three separate meta-analyses comparing: BDI patients with controls, BDII patients with controls and BDI with BDII patients following the same model as in the section described above. Similarly, we examined how mood state at the time of scanning influenced brain structure. Mood state data were available from 13 sites; patients were categorized into six different categories: euthymic, depressed, manic, hypomanic, mixed and unknown. Tabulated results showing the number of subjects with a given mood state at the time of scanning is available in Supplementary Table 1. A further description is available for mood state, age of illness onset, medications and the tests performed can be found in Supplementary Note 3.

Determination of statistical significance threshold

In total we performed 225 tests (25 separate analyses looking at nine mean brain volumes). We controlled for multiple comparisons using the false discovery rate22 at q=0.05, which corresponds to a P-value significance threshold of Pthresh<4.91 × 10−3. Throughout the manuscript we report uncorrected P-values, but indicate when a test is significant.


Subcortical volume differences between patients with BD and healthy individuals

Patients with BD had significantly lower bilateral mean volumes of the hippocampus (d=−0.232; P=3.50 × 10−7), thalamus (d=−0.148; P=4.27 × 10−3) and trending significant reduction in amygdala (d=−0.108; P=7.65 × 10−3). Patients also had significantly larger lateral ventricles (d=0.260; P=3.93 × 10−5) than healthy controls. None of the other five structures investigated (accumbens, caudate, globus pallidus, putamen and ICV) showed significant differences between BD cases and controls (Figure 1). Mean volumes (and s.e.) corrected for age, sex and ICV) by site and by diagnosis are available in Supplementary Table 7. Unadjusted means split by site and by diagnosis are available in Supplementary Table 8. Forest plots of effect sizes for each structure across all sites are shown in Supplementary Figure 1a and c.

Figure 1
figure 1

Adjusted Cohen’s d estimates for all BD patients versus controls. Effect sizes for the volumetric differences between bipolar disorder (BD) cases and controls (CTL), after accounting for age, sex and intracranial volume over all brain regions of interest. Error bars show mean effect size ± s.e.m. Effect sizes were considered significant (marked with *) if they exceeded the study-wide significance threshold (P<4.91 × 10−3).

PowerPoint slide

Interactive effects of age and sex with BD diagnosis

We examined each of the nine brain structures in our study for age-by-diagnosis and sex-by-diagnosis interactions. We found significant evidence of decreased hippocampal volume in older patients (d=−0.104; P=3.81 × 10−3). No other structures had significant interactive effects between age and diagnosis (Supplementary Table 9). Further, we tested for sex-by-diagnosis interactions for each structure and found significant evidence of increased thalamus volume in female patients with BD (d=0.202; P=9.65 × 10−5). No other structures had significant interactive effects between sex and diagnosis (Supplementary Table 10).

Effects of bipolar diagnosis subtype, mood state and age of onset on subcortical brain volumes

We did not find any significant volumetric differences when directly comparing BDI with BDII patients (Supplementary Table 11). When comparing BDI and BDII separately with controls, the direction of effects (an increase or decrease in volume) for each structure was similar regardless of subtype (Supplementary Tables 12–13). However, the magnitude of case–control differences on brain volumes was consistently larger in patients with BDI (Figure 2). Patients diagnosed with BDI had significantly smaller volumes of the hippocampus (d=−0.203; P=1.31 × 10−3) and amygdala (d=−0.117; P=3.63 × 10−3), and larger lateral ventricle volumes (d=0.251; P=4.70 × 10−3) compared with controls. In contrast, none of the subcortical brain volumes of BDII patients were significantly different from healthy controls. Further, we did not find evidence of an association between brain volume differences in any structure with age of onset (Supplementary Table 14).

Figure 2
figure 2

Adjusted Cohen’s d estimates for BD patients split by diagnosis subtype (type I or type II) versus controls. Effect sizes for the volumetric differences between bipolar disorder (BD) type I and controls (CTL) are shown in red and BD II and controls are shown in green. All effect sizes are reported after accounting for age, sex and intracranial volume. Error bars show mean effect size ± s.e.m. Effect sizes were considered significant (marked with *) if they exceeded the study-wide significance threshold (P<4.91 × 10−3).

PowerPoint slide

When examining case/control differences across sites with mood state data available, we found a significant increase in lateral ventricular volume in patients compared with controls (d=0.318; P=6.06 × 10−4). Mean hippocampal volume was decreased at a nominally significant level (d=−0.214; P=0.016), but was not strictly significant after correction for multiple comparisons (Supplementary Table 15). We found that a subset of euthymic patients (n=401) had significantly decreased hippocampal volume compared with healthy controls (d=−0.233; P=1.53 × 10−3). None of the other structures showed significant differences (Supplementary Table 16). In addition, we found that a subset of depressed patients (n=134) showed a trending significant increase in lateral ventricular volume compared with healthy controls (d=0.377; P=8.00 × 10−3). None of the other structures showed significant differences between depressed patients and controls (Supplementary Table 17). A direct comparison between euthymic and depressed patients was not possible given small sample sizes (the number of studies with both euthymic and depressed patients was too low; n=6).

Medication effects on brain volume in patients with BD

We examined the effect of treatment at the time of scanning with lithium, antiepileptics, antidepressants, atypical and typical antipsychotics on brain volume in BD patients (Supplementary Tables 18–32). We found that patients treated with lithium at the time of scanning had larger thalamic volumes compared with patients not taking lithium (d=−0.207; P=7.31 × 10−4). In addition, we found that patients taking antiepileptics had significantly reduced hippocampal volume (d=−0.189; P=4.91 × 10−3) compared with patients not taking antiepileptics (Supplementary Figure 2a). We further performed a comparison between the brain volumes of patients taking (or not taking) a given medication with the brain volumes of healthy controls. Patients that were not taking lithium at the time of scanning had significantly smaller hippocampal and thalamic volumes and larger lateral ventricles compared with controls (Supplementary Figure 2; Supplementary Table 19), whereas in patients receiving lithium therapy hippocampal volumes were comparable to controls.


One of the major points of uncertainty in BD has centered on potential volumetric changes in the hippocampus. Two large single-site studies reported a significant hippocampal reduction23, 24 but other multi-site studies reported no significant differences.3, 4, 5, 9 Also, smaller hippocampal volumes relative to controls were reported in a meta-analysis of BD patients not taking Li11 with Cohen's d=−0.36 (−0.55, −0.17) for the left hippocampus and d=−0.38 (−0.63, −0.13) for the right. Our estimated Cohen's d=−0.24 (−0.37, −0.12) for non-lithium treated patients is largely in agreement. We re-ran the meta-analysis excluding data from the two single-site studies mentioned previously (Fears et al.24 and Rimol et al.23) and observed a nearly identical reduction of hippocampal volume (d=−0.220; P=6.95 × 10−5) (Supplementary Table 33). Upon further examination, we found a significant age-by-diagnosis interaction whereby increasing age in patients was associated with a decrease in hippocampal volume (d=−0.104; P=3.81 × 10−3; Supplementary Table 9). This finding may reflect accelerated hippocampal atrophy in patients or progressive effects of chronic illness or medication on the hippocampus of patients.

Our finding of enlarged lateral ventricle volume (Cohen’s d=0.26) is in line with a previous mega-analysis by Hallahan and colleagues who reported a d=0.15 for right lateral ventricle.9 Because the UCLA-BP study includes unaffected relatives as controls we re-ran the analysis with that study excluded but this did not change the results (Supplementary Table 35). Similarly, exclusion of data with poor age matching (the Halifax and CLING studies) did not alter the results (Supplementary Table 36).

We found a significant reduction in thalamus volume in BD patients as in our previous study by Rimol et al.23 However, none of the comparable multi-site meta-analyses showed effects significantly different from zero for the thalamus.3, 4, 5, 9 To further examine case–control differences in thalamic volume we undertook additional analysis. First, we removed the data from the study by Rimol et al and re-ran the analysis. Case–control differences in thalamic volume remained nominally significant (d=−0.11; P=0.013; Supplementary Table 34). Second, we re-ran the analysis excluding data from the UCLA-BP study and found that the volume reduction in the thalamus remained nominally significant (Supplementary Table 35). Third, when the Halifax and CLING samples were removed the effect on thalamus volume was reduced but still nominally significant (Supplementary Table 36). Finally, we found evidence of a significant sex-by-diagnosis interaction whereby female patients with BD had significantly increased mean thalamic volume (d=0.202; P=9.65 × 10−5; Supplementary Table 10). However, this finding conflicts with previous reports showing no evidence of sex-by-diagnosis interactions.23 The likely cause of the sex-by-diagnosis interactive effect is unknown at this time, but deserves further investigation.

Previous literature-based meta-analyses had been unable to detect a case–control difference in the amygdala. Individual studies in adults with BD had reported either increase or no change in amygdala volume.25, 26 In contrast, reduced amygdala volume has been repeatedly reported in adolescents with BD27 and has been attributed to abnormalities in the developmental trajectory of this region in adolescence.28 In the present study, case–control differences in amygdala volume were only nominally significant when the UCLA-BP, Halifax and CLING studies were excluded. No significant effects of age of illness onset (Supplementary Table 14) or age-by-diagnosis interactions (Supplementary Table 9) were detected in amygdala volume, but this hypothesis might be better tested in a data set that specifically includes adolescents. Further, brain changes over time related to age of illness onset and duration are best studied in longitudinal models rather than cross-sectional evaluations of large cohorts.29, 30

We also examined the effect of FreeSurfer version used for segmentation. We found that the between-version agreement (Supplementary Table 6a and h) was largely in line with the within-version test–retest reliability (Supplementary Table 5). The only case that seems problematic is v4.2 segmentations of the globus pallidus; they seem to be consistently different than estimates from the other versions. Only one of the data sets in our consortium used v4.2 (UCLA-BP) and the globus pallidus results are unchanged when running the meta-analysis with that study dropped (see Supplementary Table 35). This suggests that subcortical volumes segmented with different versions of FreeSurfer are comparable.

The current findings of subcortical reduction in BD demonstrate the sensitivity of volumetric assessment, but do not necessarily implicate pathogenic specificity of subcortical structures in BD. van Erp et al.31 reported significantly lower hippocampus, thalamus and amygdala volumes and significantly larger lateral ventricles and globus pallidus volumes in patients with schizophrenia relative to healthy controls, suggesting that BD and schizophrenia may share some common pathogenic mechanisms associated with medial temporal lobe volume reduction, which has been hypothesized to be related to excessive glucocorticoid activity.31, 32, 33 In general, the effects of schizophrenia on subcortical brain volumes seem to be stronger than those of BD. This trend has been observed in prior single-site studies,23 which may be related to the relatively greater neurodevelopmental disruption characteristic of schizophrenia.34, 35 Similarly, Schmaal et al.36 examined subcortical differences in major depressive disorder and found significant reductions in the hippocampal volume of major depressive disorder patients compared with healthy controls. However, the observed effect sizes and pattern of effects seem to be much milder than those observed in schizophrenia and BD.

We tested whether BDI (n=862) or BDII (n=317) are associated with similar brain structural changes. We did not detect any significant differences between patients with BDI and BDII diagnoses (Supplementary Table 11) although we would have 80% power to detect differences of d=0.087 (Supplementary Note 4). When comparing each subtype separately with healthy controls, we found that the pattern of case–control differences in the volume of hippocampus, amydala, thalamus and lateral ventricles was similar for both subtypes (Supplementary Tables 12–13). These differences were more pronounced in BDI patients, who showed significantly smaller hippocampal and amygdala volumes and significantly larger lateral ventricles volumes relative to controls. In contrast, there were no significant differences between BDII patients and controls (Figure 2). The literature examining volumetric brain differences in BDII is decidedly sparse4, 5 and therefore most but not all4 large literature-based, meta-analyses prior to this study grouped all BD subtypes together; likely due to under-reporting of diagnosis subtype in individual studies. Furthermore, we found that the effect sizes when examining BDI patients alone were slightly smaller than the effect sizes in a model that includes all BD patients regardless of subtype (Table 1; Supplementary Table 12). These results in the current samples suggest that there were no detectable volumetric differences in subcortical brain structures between BDI and BDII. The lack of detected differences between the two subtypes mirrors findings in genetics, which also were unable to find significant genetic patterns that distinguish BDI and BDII patients despite large sample sizes.37 Future work is needed to further disentangle the complex factors that may manifest as distinct BD subtypes but have similar genetic and volumetric brain patterns.

Table 1 Effect sizes differences between all BD cases and controls (Cohen’s d) for the mean volume of each structure controlling for age, sex and intracranial volume

We examined the effect of mood state at the time of scanning to assess whether mood moderates BD effect size differences. We examined only euthymic and depressed mood states compared with controls due to the small sample sizes available of patients scanned during a hypomanic, manic or mixed state. We found that euthymic patients had significantly decreased hippocampal volume compared with controls, which was not detected in depressed patients (d=−0.233; P=1.53 × 10−3). However, none of the other comparisons made were significant after multiple comparisons correction. Further, a direct comparison between euthymic and depressed patients was not possible given the small sample sizes. As the decrease in hippocampal volume was only detected in euthymic patients and not in depressed patients, further investigation is warranted. However, caution is needed in interpreting these results given differences in sample size between the two groups. A direct comparison between the two groups would provide a more definitive assessment in the future. The findings related to mood state highlight some of the limitations and weaknesses of our approach whereby neurobiological findings cannot readily be mapped onto clinical variables and therefore have limited clinical value.

Medication is arguably one of the most widely debated sources of heterogeneity in the brain morphological signature of BD. Much of the BD literature has focused on the effects of lithium.10, 38, 39 In animal models, lithium has been shown to have neurotrophic effects in the hippocampus40 but the mechanism of action is unclear.41, 42 We performed the largest comparison of BD patients taking (n=535) and not taking (n=845) lithium to date (Supplementary Table 20). We found that patients taking lithium had significantly larger thalamic volumes than patients not taking lithium (d=0.207; P=7.31 × 10−4). However, significant effects of lithium on thalamic volume were not detected in other large studies of BD.5, 9 We did not observe significant effects of lithium in other structures (that is, hippocampus and amygdala) both of which have been studied extensively in the literature.5, 9, 11 In addition to the moderating effects of lithium, we examined the effects of treatments of four commonly prescribed medication groups: antiepileptics, antidepressants, atypical and typical antipsychotics (Supplementary Figure 2). It is difficult to distinguish medication effects on brain volumes from concurrent effects of clinical and demographic variables, which interact closely with medication status.43, 44 It is possible as well that patients that recently stopped taking a particular medication could be considered to be ‘not currently taking a medication’ potentially biasing medication effect estimates. The quality of information concerning medication effects on brain correlates can be improved using longitudinal methods along with tightly controlled medication access and reporting. Further longitudinal studies of specific medications are needed to disentangle medication effects on the brain from other moderating effects or disease effects.

A major strength of this study is the large numbers and the ability to harmonize, as far as possible the data, which, in the case of imaging data, can be quite a daunting task. However, the current study has limitations regarding the heterogeneity of the sample and potential clinical confounds. Thus, the current findings should be interpreted with caution and need to be replicated in independent cohorts. In addition, there are several key limitations to our study: (1) Samples were collected and analyzed at different sites; we coordinated our work to maximize homogeneity across sites, but differences persist. We showed that scanner field strength, voxel volume and the version of FreeSurfer used for segmentation do not significantly influence (P<0.05) the effect size estimates in our analysis (Supplementary Table 37). (2) We examined the moderating effects of medication, but caution is needed in interpreting these effects, as differences in medication status are likely to interact with illness characteristics. Differences in the clinical definitions of medication use and history could potentially confound the estimates in this study in unexpected ways. Also, we did not test or account for possible interactions between different pharmacological agents although the majority of patients were treated with a combination of agents. However, longitudinal studies of medication effects show that most medications are likely to have a null or deleterious effect on the brain45 with the exception of lithium, which may have neuroprotective effects.39 (3) Most sites used the structured interview (DSM-IV) for diagnosis, but some heterogeneity in diagnosis inevitably exists across sites and findings should be interpreted with care as they reflect the heterogeneity inherent to a large multicenter/multinational comparison. (4) Drug and alcohol dependence or abuse may be another source of heterogeneity. These data were not uniformly available within the cohorts in this study, but the potential impact of alcohol abuse on ventricular volume has been demonstrated before,46 and is being analyzed by ENIGMA’s addictions working group. (5) Finally, our hypotheses explore the effects of BD on subcortical brain structures and we did not assess effects elsewhere in the brain (for example, cortex, white matter tracts). It is known that neural networks subserve emotional processing and regulation and that these almost certainly engage a number of cortical structures. However, the current study was limited to subcortical data. The analysis of cortical data will need another analytical approach, with additional methodological challenges, which is currently ongoing.

We have demonstrated a pattern of reduced volumes of the hippocampus, thalamus and amygdala in patients with BD. Whereas one should avoid making strong functional interpretations based on brain volume differences alone, these findings pertain to neurocircuitry implicated in emotional processing and executive behavior,47 and nevertheless provide important information regarding the neurobiology of BD.48 Several functional MRI studies report dysfunction in these regions during manic or depressive episodes,49 and post mortem gene expression studies have also implicated these structures.50 Mood-stabilizing drugs may also act in this region.51 Further, we provide strong statistical evidence of a lack of difference in the biological signature of BD subtypes. These findings should be interpreted with caution given the limitations outlined throughout the manuscript, especially with consideration for the heterogeneity involved in using meta-analysis to combine neuroimaging data across sites.