Major depressive disorder (MDD) is a leading cause of disability [1, 2]. Unfortunately, a substantial portion of patients fail to respond to first-line treatments (40–60%; [2, 3]), and only a minority achieve full remission [4]. Choosing optimal, effective treatments early in the course of illness and finding ways to predict treatment outcomes are important goals of research in MDD.

Brain measures obtained from magnetic resonance imaging (MRI) have demonstrated a potential value in differentiating healthy controls from individuals diagnosed with MDD [5, 6], and for predicting treatment responses in MDD patients [7, 8]. The anatomy and function of the hippocampus have been a particular focus of research because of work, suggesting that it is important in the pathophysiology of depression and response to treatment [9, 10]. Smaller hippocampal volumes in patients with depression compared to healthy participants are now well-documented [5, 11, 12].

While there are reports of an association between structural changes in the hippocampus and clinical variables in smaller studies [6, 9, 13], large-multi-site datasets have not had standardized clinical data to permit replication of these findings. For example, the MDD working group from The Enhancing Neuroimaging Genetics through Meta-Analysis (ENIGMA) study reported significant associations between MDD and alterations in subcortical gray matter volumes, including total hippocampal volumes (THV) [5]. The ENIGMA dataset includes patients who underwent a variety of treatment interventions, making it difficult to ascertain whether these changes in gray matter volumes predict treatment outcome to a specific therapeutic intervention.

More recently, several multi-site studies have applied common treatment and imaging protocols to study the relations between brain structure and function, and the outcome of specific interventions. For example, the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical care (EMBARC) project examined brain changes in cortical thickness during the first week of antidepressant medication (ADM) treatment [14]. Thickening in the rostral anterior cingulate cortex (rACC) was associated with the change in symptom severity during a trial week of sertraline, but baseline THV data did not inform the ADM treatment response prediction algorithm [14]. The International Study to Predict Optimized Treatment in Depression (iSPOT) collaborative designed studies to identify pre-treatment structural MRI measures that could predict acute treatment outcomes [15, 16]. In one of these investigations, Maller et al. [16] found that larger hippocampal tail (Ht) volume predicted clinical remission, independent of total brain volume (TBV), age or THV [16].

Although there are divergent findings about whether THV predicts treatment outcomes [17,18,19], converging lines of evidence now suggest that subfield-specific changes in hippocampal structure may be a reliable biomarker for treatment remission [8, 16]. Several studies using manual segmentation of hippocampal subregions reported that localized hippocampal volumes might be associated with the rate, extent, and maintenance of clinical response to ADM [7, 20,21,22]. Notably, there are even earlier reports showing that hippocampal volumes predict treatment outcomes even on the longer term [12, 23]. However, these studies were not looking at hippocampal longitudinal changes of subfield-specific hippocampal volumes.

More recent studies employed an automated segmentation software FreeSurfer 6.0 with integrated hippocampus subfield-specific atlases (FS6.0-sf) [24]. Maller et al. [16] detected Ht volume group differences between remitters and non-remitters to ADM with the help of an atlas building algorithm that can measure 12 hippocampal subfields using T1-weighted MRI images as input [24, 25]. Cao et al. [8] employed the same segmentation pipeline to identify patients likely to achieve remission following electroconvulsive therapy and to examine hippocampal subfield volumes in patients with bipolar disorder and first episode psychosis [26,27,28]. Using a more advanced approach, combining T1 and T2* images, Roddy et al. [6] have also shown selective subfield-specific differences between MDD and healthy control participants. The updated FS6.0-sf segmentation pipeline (cross-sectional [25]) has outperformed previous segmentation strategies on reliability parameters [29]. The FS6.0-sf workflow can be conducted using default settings (for instructions see ref. [25]) that reduce the likelihood of methodological divergence between study sites, and consequently improve the likelihood of replication. This is important because the lack of reproducibility in neuroimaging research is a growing concern [30], and may be partially explained by methodological inconsistencies.

Given the established literature describing the connection between the hippocampal pathology and MDD, it is surprising that only a few studies have aimed to reproduce findings on hippocampal subfield-specific volumes as a predictor of ADM response/remission. In this report, we therefore aimed to replicate and extend the finding [7, 16] that Ht volumes predict remission with antidepressant treatment, in a large and strictly independent sample of MDD patients undergoing a uniform ADM treatment protocol. Therefore, we hypothesized that Ht volumes will be smaller in patients with MDD compared to controls. Furthermore, based on previous work [7, 16], we expected that larger baseline Ht volumes would be associated with an increased probability of response/remission following ADM treatment.

We incorporated outputs from both automated (FS6.0-sf) and manual segmentation protocols, analyzing data gathered from The Canadian Biomarker Integration Network in Depression (CAN-BIND), the details of which are available elsewhere [31]. We analyzed baseline neuroimaging and clinical data and assessed for response and remission status at weeks 8 and 16. As a secondary goal, we investigated whether hippocampal volumetry informs not only the degree of improvement, but the rate at which improvement occurs in patients treated with ADM.

Methods and materials

Study participants

Study participants were recruited from six academic health centers across Canada. The initial sample included 196 participants with MDD and 110 healthy comparison (HC) participants that met CANBIND-1 inclusion and exclusion criteria (see ref. [31] for details). Briefly, the Mini-International Neuropsychiatric Interview was used to confirm group assignment [32]. Patients were included if they scored ≥24 on the Montgomery-Åsberg Depression Rating Scale (MADRS) [33] at their baseline visit. Patients were either ADM naive or had a wash-out period of at least five half-lives for psychotropic medications before receiving 10–20 mg/day escitalopram. Patients whose symptoms did not improve with the initial treatment after 8 weeks of monotherapy received adjunctive aripiprazole 2–10 mg/day for an additional 8 weeks.

Treatment response was defined as a reduction in MADRS score of at least 50% from their baseline score. Remission was defined as MADRS score of ≤10. Clinical assessments were conducted by practicing psychiatrists at each study site; treatment outcomes were assessed at multiple time points, including week 8 and 16 (for the procedural details see ref. [31]). We analyzed both response and remission status at 8 and 16 weeks. For clarity, the response and remission statuses were analyzed separately. Participants who successfully responded/remitted at both 8 and 16 weeks were considered as early responders/remitters. Participants who responded/remitted only at the end of the study were considered late remitters/responders.

The CAN-BIND sample excluded participants with an Axis I diagnosis other than MDD as a primary diagnosis; significant Axis II diagnosis; substance abuse within the past 6 months; history with adverse reactions to escitalopram; history of neurologic diseases, head trauma. The healthy comparison group included participants 18–60 years of age with no history of Axis I or Axis II disorders as determined by the MINI [32]. All eligible study participants provided written informed consent for all procedures after a complete description of the study.

MRI data acquisition and processing

The CAN-BIND neuroimaging acquisition protocols have been published [34]. Briefly, all sites followed similar MRI acquisition protocols performed on 3T MR scanners. A whole-brain T1-weighted turbo gradient echo sequence was acquired at 1 mm3 resolution. The pulse sequence parameters were: repetition time (TR) = 6.4–1900 ms; echo time (TE) = 2.2–3.4 ms; flip angle = 8–15°; inversion time (TI) = 450–950 ms; field of view (FOV) = 256 mm; matrix dimensions 220 × 220 and 256 × 256; 155–192 contiguous slices at 1 mm thickness. A vitamin E pill was used as a stereotactic marker that was placed at the right side of the participant’s head. Data quality control (QC) and data quality assurance (QA) procedures for the CAN-BIND MRI protocols have been described [34].

Automated hippocampal subfield segmentations were performed using FreeSurfer version 6.0 ( A comprehensive description of this pipeline is provided by Iglesias and colleagues [25]. The pipeline generated THV as well as 12 additional segmentations for hippocampal subregions: hippocampal tail (Ht), subiculum, fissure, presubiculum, parasubiculum, molecular layer (ML), granule cell layer and molecular layer of the dentate gyrus (GC-ML-DG), fimbria, the cornu ammonis (CA) area subdivided into CA1, CA2/3, CA4, and hippocampal amygdala transition area (HATA). Additional brain volumes were gathered via the asegstats2table FreeSurfer 6.0 command, which gathers statistics on the whole-brain segmentation routine [35]. These measurements include: an estimated total intracranial volume (eTIV), total gray matter volume, total white matter volume. Total brain volumes (TBV) were calculated from FS6.0 output (total gray matter plus total white matter volume).

Manual segmentations of THV were also obtained using the European Alzheimer’s Disease Consortium (EADC)—Alzheimer’s Disease Neuroimaging Initiative (ADNI) Harmonized Protocol (HarP) [36]. Two trained tracers (NN, MM) followed the EADC-ADNI HarP manual [37], and when necessary referred to an atlas of the human brain [38] to ascertain the correct anatomical identification. All baseline scans were manually traced by both tracers blinded to group allocation.

A three-step quality control procedure for hippocampal segmentation is described in Supplementary Material (see 1.1.).

Statistical analyses

All data were analyzed using SPSS version 25.0 (IBM SPSS Statistics for iMac, Armonk, NY, USA). Pearson correlations (r) and intraclass correlation coefficients (ICC) were used to compare automated and manually segmented hippocampal volumetry outputs. To compare hippocampal volumes between groups, simple t-tests and analysis of variance were used. A general linear model (GLM) was used for both regression and univariate analysis of covariance (ANCOVA) to account for covariate terms. Covariate terms were predetermined using a  backwards multiple linear regression model. We used a receiver operating characteristic (ROC) analysis to evaluate predictive properties of variables of interest. Statistical analyses were two-tailed with significance set at the 0.05 alpha levels. The Bonferroni correction was used for multiple comparisons only.


Sample characteristics

Table 1 summarizes the demographic and clinical information of the study sample across all time points. At baseline, there were no significant differences in mean ages or other demographic variables between MDD and HC participants, although healthy men were younger (mean age = 32.4) than men with MDD (mean age = 38.7). There were no significant age differences between remitters/responders and non-remitters/non-responders at week-8 nor at week 16. For completeness, a schematic flow chart illustrates the schedule of interventions and assessments for the treatment arms in the present study (see Supplementary Fig. 1).

Table 1 Demographic and clinical characteristics of study participants across several time points

General neuroimaging characteristics

THV measurements from FS6.0-sf and manual segmentations were correlated (Cronbach’s Alpha = 0.809, Pearson r = 0.67, ICC = 0.507, p = 0.001), as previously reported in a large longitudinal methodological study [39].

There was a significant difference between left and right THVs according to both segmentation methods (FS6.0-sf: t = −6.13, df = 298, p < 0.05; manual: t = −6.9, df = 305, p < 0.05), so further analyses were conducted separately for left (-lh) and right (-rh) hemispheres. Hemispheric asymmetry analysis was conducted as previously described [39]; the asymmetry index was used to determine the directionality of the lateral asymmetry and absolute values (ABS) were used to examine the magnitude of the asymmetry regardless of the directionality (ABS|Right-Left|). The asymmetry Index identified THV rightward asymmetry in both manual (HC: + 0.18, MDD: + 0.22) and FS6.0-sf segmentation methods (HC: + 0.14, MDD: + 0.19).

Determinants of hippocampal volumes

The automated segmentation workflow FS6.0-sf uses a probabilistic atlas to estimate hippocampal subfields with a Bayesian Inference algorithm that enables to adapt to the particular MRI image intensity characteristics of each scanner [25]. Nevertheless, differences in neuromorphometric data collected across different sites and MR hardware may still be present.

To control for possible between-site variance in hippocampal volumes we used a backward multiple regression model. We regressed THVs as dependent variable and study site, age, sex, handedness, and TBV were included as independent variables. This model retained only TBV as a covariate term for left (R2 = 0.57, F = 396.6, df = 1298, p = 0.000) and right THV R2 = 0.54, F = 356.4, df = 1298, p = 0.000). Study site differences did not significantly contribute to explaining hippocampal volume variability (lh: p = 0.963; rh: p = 0.942). A similar procedure was conducted for Ht as a variable of interest where both TBV and sex were meaningful terms for left (R2 = 0.24, F = 29.2, df = 2298, p = 0.000) and right (R2 = 0.28, F = 60.6, df = 1298, p = 0.000) Ht volumes. Therefore, further analyses were conducted using TBV (for total volumes) or TBV and sex as covariate terms for Ht volumes.

When controlled for age and sex, participants with MDD had significantly smaller TBVs as compared to HCs (F = 4.8, df = 301, p = 0.028), however, there were no significant group differences in eTIV (F = 2.2, df = 301, p = 0.13). As TBV did not include volumes of lateral and third ventricles, that are within anatomical proximity to the hippocampus, we also compared ventricular volumes between study participants. Although MDD participants had observably greater ventricle volumes as compared to HC, these differences were not statistically significant (see Supplementary Table 1).

Hippocampal volumes and clinical features of depression


THVs did not differ between groups. Ht volumes, however, were significantly smaller bilaterally in MDD group as compared to HC (lh: F = 5.5, df = 1296, p = 0.019 η2 = 0.19, power = 0.65; rh: F = 5.3, df = 1296, p = 0.021, η2 = 0.18, power = 0.65). Interestingly, manual segmentation detected a significant difference in THV absolute asymmetry between MDD participants and HCs (F = 5.8, df = 1301, p = 0.016, η2 = 0.19, power = 0.71). Detailed information and group statistics for other hippocampal subfields are provided in Supplementary Table 1.

Treatment outcomes at week 8

Patients who achieved remission by week 8 had larger left Ht volumes at baseline (lh: mean = 566.2 mm3, SEM = 9.1 mm3) compared to patients who did not achieve remission (lh: mean = 540 mm3, SEM = 5.9 mm3) (F = 5.5, df = 1167, p = 0.020, η2 = 0.033, power = 0.64). In addition, patients who achieved remission by week 8 demonstrated significantly lower baseline Ht absolute asymmetry volumes (mean = 27 mm3, SEM = 4.7 mm3) compared to non-remitters (mean = 45.9 mm3, SEM = 3 mm3) (F = 11, df = 1167, p = 0.001, η2 = 0.063, power = 0.91).

Left Ht volumes also approached statistical significance levels for associations with the treatment response status at week 8 (F = 3.3, df = 1164, p = 0.07, η2 = 0.02, power = 0.44). Notably, non-responders were more likely to have greater absolute Ht asymmetry volumes at baseline compared to those who responded at week 8 (F = 4.8, df = 1164, p = 0.03, η2 = 0.028, power = 0.58). For other subfields, group comparisons are presented in Supplementary Table 2.

Treatment outcomes at week 16

Consistent with findings at week 8, MDD participants who achieved remission by week 16 were more likely to have larger left and right Ht volumes at baseline than patients who did not achieve remission (lh: F = 7.5, df = 1150, p = 0.007, η2 = 0.048, power = 0.77, rh: F = 6, df = 1150, p = 0.01, η2 = 0.039, power = 0.68). Of those 96 participants who achieved remission at week 16, only 42 remitted by week 8 and maintained remission until week 16 (see Supplementary Fig. 2). This cohort had significantly lower Ht absolute asymmetry (mean = 25.2 mm3, SEM = 5.3 mm3) compared to patients who only achieved remission by week 16 (mean = 47.5 mm3, SEM = 4.7 mm3, p = 0.01) or those who did not remit at all (mean = 45.4 mm3, SEM = 4.5 mm3, p = 0.018) (F = 5.2, df = 1143, p = 0.006, η2 = 0.069, power = 0.82, Bonferroni corrected; see Fig. 1).

Fig. 1
figure 1

Different patterns of hippocampal tail volume disproportions in patients with MDD. Green marker shows distal zone of the hippocampal tail (coronal section). L = left side; white bar indicates 1 cm measurement. Early sustained remission = MDD participants that achieved remission at week 8 and maintained the status until week 16. Late remission = MDD participants that remitted only by the end of the study, at week 16; no remission = MDD participants who did not achieve remission of their symptoms at any point of time during the study

Responders at week 16 were more likely to exhibit larger Ht volumes at baseline as compared to participants who failed to respond (F = 11.7, df = 1150, η2 = 0.072, power = 0.92, p = 0.001) (See Fig. 2). Notably, 3rd ventricle volumes were significantly greater in both non-responders and non-remitters as compared to responders (F = 4.2, df = 1151, η2 = 0.028, power = 0.53, p = 0.04) and remitters (F = 3.9, df = 1151, η2 = 0.026, power = 0.5, p = 0.048) at week 16. (For details see Supplementary Table 3).

Fig. 2
figure 2

Quadratic regression model to represent significant associations between the results of total Montgomery–Åsberg Depression Rating Scale and proportional values of the left hippocampal tail. Hippocampal tail (Ht) volumes were divided by volumes of the 3rd ventricle and regressed with total MADRS scores. Quadratic non-linear function appeared to be the best fit to explain the variance in MADRS scores at weeks 8 and 16

Predictive ability of hippocampal tail volume as a proportional variable

As results from the present statistical analyses demonstrated consistently higher F-values for the left Ht volumes particularly, we specifically concentrated on predictive analyses for the left hemisphere (see Supplementary Tables 13). Detailed results of the ROC analyses are reported in Tables 2 and 3. The left Ht volumes had the area under the curve (AUC) of 0.56 in predicting remission status at week 8 and AUC = 0.58 in predicting remission at week 16. TBV as a separate variable was not a significant predictor for remission statuses at week 8 (p = 0.6) or week 16 (p = 0.9). However, when left Ht volumes were considered as a proportional value of the TBV (lh Ht:TBV), we observed stronger predictive characteristics for both week 8 (AUC = 0.59) and week 16 remission statuses (AUC = 0.61). Since TBV did not include the volume of the ventricles, we tested the predictive properties of Ht as a proportional value relative to the anatomically adjacent 3rd ventricle (lh Ht:3rd ventricle). The predictive ability of these proportional variables (lh Ht:3rd ventricle) were observably higher AUC = 0.64 for remission statuses at both time points; and AUC = 0.67 as a predictor of the response status at week 16. When Left Ht was divided by the absolute asymmetry, the differentiation between early and late remitters was at the level of AUC = 0.70 (CI [0.59,0.81], p = 0.001).

Table 2 Characterizing predictive ability of hippocampal tail volume as a proportional value
Table 3 The results of receiver operating characteristic analyses

The ratio of lh Ht:3rd ventricle was also correlated with total MADRS scores from week 8 (Pearson r = −0.30, p = 0.001) and week 16 (Pearson r = −0.31, p = 0.001) (see Table 2, upper panel). A quadratic regression model with lh Ht:3rd ventricle as a dependent variable explained 34% variability for percent improvement in MADRS at week 8 (R2 = 0.34, F = 16.9, df = 2,68, p = 0.000) and 28% for week 16 (R2 = 0.28, F = 15.8, df = 2,61, p = 0.000) (See Fig. 2). When Lh Ht:3rd ventricle was correlated with age separately for MDD and HC groups, significant negative correlations were only within the MDD group (r = .−423, p = 0.001). Finally, a general linear model with the dependent variable of interest (lh Ht:3rd ventricle) and covariate terms that included sex, age, and lateral ventricles, explained ~40% of variance for both remission and response rates at weeks 8 and 16 (see Table 2, lower panel).


Consistent with previous reports, left Ht volumes were reduced at baseline in depressed participants compared to healthy participants. Our primary interest, however, was in determining whether Ht volumes at baseline had the capacity to predict outcome in MDD participants receiving ADM [7, 16]. Regarding outcomes in this sample, approximately one-third of patients achieved remission following 8 weeks of escitalopram and were maintained on escitalopram. Participants who had not responded by 8 weeks had another eight weeks of escitalopram along with adjunctive aripiprazole, increasing the cumulative response and remission rates to 74.7% and 60.8%, respectively. MDD participants with larger left Ht volumes at baseline were, in fact, more likely to achieve remission at week 8 and week 16. This association was specific to Ht volumes, as we did not find a significant association between the outcome of treatment and THV measurements, using either manual or automated segmentation methods.

Consistent with Maller et al. [16], we report that TBV, but not eTIV, measures are significantly smaller in MDD participants compared to HC [16]. Indeed, TBV measurements appear more relevant and may be further evaluated as a neuroimaging indicator of structural deficits in MDD participants. In our sample, we observed another well-documented finding that cerebral ventricles of depressed patients were greater as compared to healthy participants (for example ref. [40]). As expected, aberrant patterns of ventricular enlargement were correlated primarily with age of the depressed participant. Importantly, in the present analysis this curvilinear pattern of ventricular enlargement independently coincided with a significant decrease in Ht volume in depressed participants.

We therefore examined whether the predictive power of the Ht volumes may be enhanced if Ht volume was considered as a proportion relative to the 3rd cerebral ventricular size. This proportional variable significantly correlated with MADRS total scores. When adjusted for age, sex, and lateral ventricles, the left Ht:3rd ventricle proportion explained ~40% of variance in response and remission statuses at both week 8 or 16.

Generally, late remitters were characterized by leftward Ht asymmetry and disproportionally small (58%) left Ht volume relative to 3rd ventricle volume. Depressed participants who did not achieve remission even by week 16 were characterized by rightward Ht asymmetry and pronounced disproportions in left Ht (55%). Early sustained remitters exhibited greater left Ht volumes (65%) in relation to the 3rd ventricle volumes without detectable deviations in Ht asymmetry (see Fig. 1).

Therefore, MDD participants with exaggerated patterns of Ht volume disproportion and asymmetry achieved remission later following first initiation of ADM. Disproportionally small Ht may be predictive of the rate of improvement to remission, as well as the overall likelihood of remission. Although Ht may help predict who will remit, Ht asymmetry may differentiate early and late remitters or those who are likely to benefit from atypical adjunctive antipsychotic medication.

While the role of the hippocampus and stress in the pathophysiology has been established [9], it remains unclear why the Ht in particular is the most vulnerable to MDD. Anatomically, the initial segment of the tail resembles the body of the hippocampus [41]. The posterior region of the hippocampus is vascularized by a separate group of hippocampal arteries that are prominently anastomosed [42], suggesting its distinct physiological importance. However, in the tail, the CA1 region is heavily folded where terminal segments of the arterioles are particularly vulnerable to anoxia due to stress-induced vasoconstriction [43]. As common ADM pharmacological agents may only partially ameliorate these illness-associated hippocampal deficits [44, 45], there exists the possibility that Ht volume decline may relate to a cumulative imprint of previously long-lasting untreated depressive episodes. Small Ht volumes found in non-remitting patients may be a surrogate marker of previously unmanaged depressive episodes that in turn—reduce the likelihood of treatment response to current ADM [9]. We could note, however, that MacQueen et al. [7] found smaller Ht volumes in patients presenting with a first treated episode of illness hence—it is less likely that past illness episodes are the sole explanation of the finding. It is also unclear why hippocampal and ventricular asymmetry may have pathophysiological significance for clinical outcomes in patients MDD. It is possible that the phenomenon of occipital bending (Yakovlevian torque [46]) may be relevant to the observed phenomenon that connects hippocampal tail asymmetry and remission status [47].

Many neuroimaging studies have documented global alterations in hippocampal volumes not only in depression but in psychotic disorders [48] and other stress-sensitive illness such as post-traumatic disorder (both subfield-specific [49] and global changes [50]). This may reflect a general vulnerability to stress-related changes independent of the discrete nature of the illness. Interestingly, bipolar disorder is one illness where small volumes [51] have been less reliably reported, raising questions about the neuroprotective effects of common treatments for bipolar disorder, such as lithium [52, 53].

There are some limitations to the present work. Although the analyses of covariance included sex and age, sex-specific differences in the hippocampal structure were not explored in relation to ADM outcomes, as more female participants both entered and remained in the study. The main limitation of FS6.0.-sf segmentation workflow using 1 mm T1-image as input is that the position of the internal boundaries between the hippocampal subregions heavily relies on the probabilistic estimation, thus the volumes of internal subfields of the hippocampus should be interpreted with caution. However, volume estimations of the hippocampal tail and fimbria do not suffer from this technical limitation. While the strength of the study was the consistent algorithmic approach to treatment, this feature also limits our capacity to make inferences regarding other classes of antidepressants or other adjunctive strategies. We examined response status in participants at week 8 and week 16. In general, the pattern of results was not as stable when using the response as the outcome measure, and this appears consistent with reports in the literature that favor remission as the outcome measure more reliably predicted by volumetric analyses [7, 8, 16].

To the best of our knowledge, there have been no previous attempts to replicate Ht volume as a marker of treatment outcome in a large independent sample employing a uniform, standardized image processing workflow, and a uniform intervention protocol. This study adds to an emerging body of clinical literature that consistently reports that Ht volumes predict the outcome to ADM in participants with MDD. We also show that the degree of disproportion of Ht may be more informative biomarker as compared to Ht volume on its own and may potentially surpass group-level association to predict treatment outcome. Future studies should confirm whether this prognostic feature of Ht volumes extends to other treatment modalities.