Major depressive disorder (MDD) is the leading cause of disability worldwide and a substantial percentage of patients with MDD will not fully remit or will experience new episodes after remission (Simon et al, 2004). Such non-remitted course may predict an unfavorable chronic course of MDD (Rush et al, 2006; Trivedi et al, 2006). However, it is currently unclear which patients will show a favorable or unfavorable course. Studies on treatment response have indicated the importance of structure and functioning of the lateral prefrontal cortex (PFC) (Heller et al, 2013; Langenecker et al, 2007; Ritchey et al, 2011), medial PFC (Ritchey et al, 2011), anterior cingulate cortex (ACC) (Fu et al, 2013; Kemp et al, 2008; Pizzagalli, 2010), insula (Fu et al, 2013; Langenecker et al, 2007; McGrath et al, 2013), amygdala (Langenecker et al, 2007), and hippocampus (Frodl et al, 2004; Fu et al, 2013) for predicting remission following treatment. However, treatment-response markers are not necessarily markers of a naturalistic course of depression. A naturalistic prospective investigation could elucidate the neurocognitive mechanisms mediating long-term episodic stages of depression (Frodl et al, 2008). Such markers might facilitate early detection and intervention to minimize cumulative effects (Clark et al, 2009).

Memory biases are an important aspect of the cognitive symptoms in MDD (Airaksinen et al, 2007; Ebmeier et al, 2006). Studies consistently point to enhanced memory for negative emotional information and diminished memory for positive emotional information (Bower, 1981; Gotlib et al, 2004; Leppänen, 2006; Rinck and Becker, 2005). It has been proposed that altered emotional memory formation may predict remission or the risk of recurrence (Pringle et al, 2011). Notably, it has been shown that patients who were going to improve 9 months later had an enhanced memory for positive stimuli at the start of the study than those who did not improve (Johnson et al, 2007). Moreover, activation of cortical midline structures during encoding of negative pictures has been found predictive of worsening of depressive symptoms in a small sample (Foland-Ross et al, 2014). However, the neural mechanisms associated with depressive course trajectories that relate to memory processing of both positive and negative materials have not been explored so far.

Previously, we observed a blunted response of the hippocampus (a key structure for memory processing) during positive memory encoding and an elevated insular response during negative memory encoding in depressed patients (van Tol et al, 2012). Because hippocampal and insular effects were present irrespective of current clinical status and therefore may represent trait markers associated with the course of MDD, we hypothesized that hippocampal and insular activation differentiates patients with a favorable course trajectory from patients with an unfavorable course trajectory.

The aim of this study was to investigate whether the neural correlates of memory processing are associated with the subsequent course of depression. Patients in a current depressive episode and healthy controls (HCs) were scanned using functional magnetic resonance imaging (fMRI). We hypothesized that patients who would not remit within 2 years would show better memory performance for negative words and worse memory performance for positive words at baseline compared with patients who would remit and HCs. On a neural level, we hypothesized that blunted hippocampal activation during positive word encoding and higher hippocampal, amygdalar, and insular activation during negative word encoding is associated with non-remitting course. Second, we aimed to more broadly explore whether the activation in regions previously related to treatment response (ie, ACC, lateral PFC, medial PFC, hippocampus, insula, and amygdala) during positive and negative word processing is associated with subsequent course.

Materials and Methods


Participants were recruited from the ongoing longitudinal naturalistic Netherlands Study of Depression and Anxiety (NESDA), involving the University Medical Center Groningen (UMCG), Academic Medical Center (AMC), and Leiden University Medical Center (LUMC). The ethical review boards of each center approved this study and all participants gave written informed consent. Complete data at baseline (S1) was available for 215 participants, of which 110 patients had a half-year diagnosis of MDD based on the Composite International Diagnostic Interview (CIDI life time-version 2). We included only patients who showed depressive symptoms indicative of at least a mild depressive episode according to the Inventory of Depressive Symptomatology (IDS>13; Rush et al, 1996) on the day of scanning and of whom longitudinal 2-year follow-up data (S2) on course was available (n=74). During the follow-up period, patients received treatment as usual (or no treatment if wished by the patient). In addition, 45 HCs who did not have a current or lifetime diagnosis of a DSM-IV disorder were included. Detailed sample characteristics and exclusion criteria are described in Table 1 and Supplementary Materials and Methods.

Table 1 Demographic Description

Diagnostic status and depressive state were assessed at S1 and S2 (Table 1) with the CIDI and life chart method (LCM) (Lyketsos et al, 1994). The methodology of the LCM has shown to have high validity and reliability (Warshaw et al, 2001). Based on the LCM at S2, patients were divided into four trajectory groups. Patients who remitted within 1 year after S1 without recurrence in the follow-up period were defined as remitters (REM, n=22). We defined remission as a period of 3 months without symptoms or symptoms without burden. Recurrence (REC, n=23) was defined as recurrence of symptoms for at least 1 month with burden after firstly obtaining remission. Non-remission (NONREM, n=25) was defined as experiencing symptoms with burden in every month following S1 during the entire follow-up period. Patients who obtained remission later than 1 year after S1 were defined as slow remitters (n=4). We included slow remitters into NONREM for power reasons and followed up the results in NONREM by performing a sensitivity analysis leaving out these subjects. In total, we included 119 participants: 22 REM, 23 REC, 29 NONREM, and 45 HCs (Table 1). Furthermore, time to remission (months) was calculated for each patient and used as an index of course trajectory in the regression analyses.

Task Paradigm

All participants performed an event-related, subject-paced, emotional word-encoding and -recognition task (Daselaar et al, 2003; van Tol et al, 2012) that was presented using E-prime software (Psychological Software Tools, Pittsburgh, PA) during fMRI scanning. During encoding, participants classified 40 negative, 40 positive, and 40 neutral words according to their valence. Words were presented in pseudorandomized order. After a 10-min interval, these 120 words were presented again, mixed with 120 new emotional words, and participants had to indicate whether the word was seen during the encoding phase. A detailed description is in the Supplementary Materials and Methods.

fMRI Data Acquisition

fMRI data were collected with 3 T Philips MR scanners located at the three sites. A SENSE-8 channel head-coil was used in Groningen and Leiden and a SENSE-6 channel head-coil in Amsterdam. In Groningen, echo planar imaging volumes of 39 axial slices were acquired in interleaved ascending order (no gap) using a T2*-weighted gradient echo sequence (TR=2300 ms, TE=28 ms, matrix size=64 × 64, plane resolution=3 × 3 mm2, slice thickness=3 mm). Settings for Leiden and Amsterdam were slightly different: 35 slices, TR=2300 ms, TE=30 ms, matrix size=96 × 96, plane resolution=2.29 × 2.29 mm2, slice thickness=3 mm. Additionally, an anatomical MRI was obtained with a 3D gradient-echo T1-weighted sequence (TR=9 ms, TE=3.5 ms, matrix size=256 × 256, voxel size=1 × 1 × 1 mm3, 170 slices).

Data Analysis

Clinical variables and behavioral data

Demographic, psychometric assessment, and behavioral data were analyzed in SPSS v.16.0 (SPSS, Chicago, IL). For the demographic and psychometric data, we used analyses of variance (ANOVA), χ2 tests, and t-tests where appropriate with a significance level of p<0.05 (after Bonferroni correction if appropriate).

For the behavioral data, reaction times (hits and false alarms), number of words classified according to valence and recognition accuracy (proportion hits and false alarms) (Tulving, 1985) were calculated. Main effects of group (4; HC, REC, REM, NONREM) and valence (3; positive, neutral, negative) and the interaction of group and valence were investigated with a repeated-measures ANOVA. In case a significant main effect or interaction effect was detected (p<0.05), post hoc t-tests were conducted at a significant level of p<0.05 (two-tailed) after Bonferroni correction.

Imaging data

A full description of preprocessing and modeling using Statistical Parametric Mapping software version 5 ( can be found in van Tol et al (2012) and in the Supplementary Materials and Methods. For each participant, we defined the following contrasts: [successfully encoded positive words>successfully encoded neutral words], [successfully encoded negative words>successfully encoded neutral words], [correctly recognized positive words>correctly recognized neutral words], and [correctly recognized negative words>correctly recognized neutral words]. These contrasts were chosen to focus on valence specificity of both encoding and recognition phases. The number of error trials was too low (reported in van Tol et al, 2012) to test for proper memory effects (ie, [successfully encoded words>missed words], [successfully recognized words>missed words]).

Group Analyses

At second level, two four (group; HC, REM, REC, NONREM) by two (valence; positive>neutral and negative>neutral) flexible factorial models were set up separately for the encoding and recognition phase. In the first model, the main effect of group was assessed (across valence; model 1), and in the second model, the interaction between group and valence (model 2) was modeled. We set up these two models, because the sensitivity to detect interaction effects increases by leaving out the general differences between groups, which is not specific for the task (Gläscher and Gitelman, 2008), and thus main effects and interactions cannot be estimated in the same model. To test for valence-specific effects of group, we subsequently set up a full-factorial model, with group as between-subject factor and valence as within-subject factor. This model was only used to test for group effects during either positive or negative encoding/recognition. Site was added by means of two dummy variables to each model, in addition to age and years of education. Analyses were repeated after excluding slow remitters (n=4).

Consistent with our previous report (van Tol et al, 2012), main effects and interactions (F-tests) were explored at p<0.005 uncorrected. Post hoc t-tests had to meet p<0.05 familywise error (FWE) corrected at the voxel level for the spatial extent of a priori defined regions of interests (ROIs) (with an initial threshold of p<0.005, uncorrected). To follow-up our previous results (van Tol et al, 2012), we created two sets of masks for testing main effects of trajectory group: the right hippocampus and left ACC for positive word encoding, and the right hippocampus, left amygdale, and left insula for negative word encoding. The regions were defined by the Anatomical Automatic Labeling (AAL) system (Tzourio-Mazoyer et al, 2002) implemented in the Wake Forest University Pick Atlas toolbox (Wake Forest University, Winston Salem, NC). We set significance for effects occurring in these regions at pFWE<0.05, corrected for the number of ROIs (ie, 2 for positive encoding; 3 for negative encoding). As signal in these regions are non-independent, we took their interdependency into account for calculating α-level by using the Simple Interactive Statistical Analysis Bonferroni tool (SISA Bonferroni; Correlation analysis showed a mean correlation of r=0.64 between the signal in the left insula, left amygdala, and right hippocampus during negative word encoding as defined by the AAL label. Therefore, the threshold was set to α=0.034 to hold FWE-control during negative word encoding after SISA-Bonferroni adjustment. A mean correlation between the right hippocampus and left ACC was r=0.51 in the corresponding AAL labels and the SISA-Bonferroni adjustment was set as α=0.029 to hold FWE-correction during positive encoding.

For exploring the effect in a broader set of regions previously associated with depression and treatment response (ie, bilateral ACC, lateral PFC, medial PFC, hippocampus, insula, and amygdala), we defined a separate set of a priori AAL masks. SISA-Bonferroni adjustment values was set for positive (α=0.027; mean r=0.65) and negative encoding (α=0.029; mean r=0.69) for these ROIs (n=6). Effects occurring outside our ROIs had to meet p<0.05 FWE, whole brain corrected.

Regression Analyses

To test the effect of illness trajectory within patients, we built two full factorial models for encoding and recognition with valence as factor (2; positive>neutral and negative>neutral) and time to remission as an interacting covariate with valence. Age, years of education, and site (two dummy variables) were added as covariates. We used the same set of a priori AAL masks as used for the explorative group analyses. SISA-Bonferroni adjustment values were set for positive (α=0.028; mean r=0.68) and negative encoding (α=0.031; mean r=0.74) for these ROIs (n=6). Effects occurring outside our ROIs had to meet p<0.05 FWE, whole brain corrected.

Next, we added illness severity at S1 (IDS scores) as a covariate to each model to test for its possible confounding effect. Finally, to control for medication use, we repeated these analyses after excluding the selective serotonin reuptake inhibitor (SSRI) users. In addition, to control for a possible effect of psychotherapy use at S1, we repeated the analyses by adding it as a dummy covariate (yes/no).


Sample Characteristics

Sample characteristics are listed in Table 1. The four groups were comparable on age, sex, and years of education. The patient groups were comparable on SSRI use at S1 and S2, psychotherapy use at S1, childhood trauma, IDS scores at S1, Beck Anxiety Inventory scores at S1 and S2, comorbidity with anxiety at S1, years since onset of depression at S1, and months with depression in the 5 years before S1. In the follow-up period, NONREM received more psychotherapy than other patient groups, but pharmacological treatment was taken in similar extents. IDS scores at S2 in REM were significantly lower than in NONREM.

There was a significant group effect in time to remission: time to remission in REC and REM was shorter compared with NONREM. The REC and REM group did not differ in this.

No effect of trajectory group was observed on memory performance. Behavioral results are summarized in Figure 1 and Supplementary Table S1.

Figure 1
figure 1

Behavioral data. (a) Word classification during encoding, Y-axis: number of words (b) reaction time (Encoding phase), Y-axis, seconds (c) recognition accuracy (Hits), Y-axis, proportion and (d) reaction time (Recognition phase) Y-axis, seconds. HC, healthy controls; NONREM, non-remitted patients; REC, remitters with recurrence; REM, fast remitters.

PowerPoint slide

fMRI Results


Encoding results are listed in Table 2A and Figure 2a (group analysis) and Table 3 and Figure 2b (regression analysis). During successful encoding, a main effect of group was observed in the right insula, right dorsolateral PFC (DLPFC), right parahippocampal gyrus, and right posterior cingulate cortex extending to the fusiform gyrus (model 1), indicating higher activation in NONREM compared with HCs. However, post hoc t-tests across valences did not survive correction for multiple comparisons in these areas.

Table 2A Main Effect of Group and Interaction Between Group and Valence During Successful Encoding
Figure 2
figure 2

Brain activation during emotional memory task. (a) Higher activation of the left insula in patients with non-remission during negative word encoding in group analysis. (Contrast: NONREM>HC; effects are displayed at T>2.60, p<0.005 uncorrected). (b) Higher activation of the right hippocampus in patients with non-remission during negative encoding in regression analysis. (Contrast: NONREM>HC, effects are displayed at T>2.60, p<0.005 uncorrected). NONREM: non-remitted patients; REM: fast remitters; REC: remitters with recurrence; HC: healthy controls.

PowerPoint slide

Interactions of group and valence were observed in the left insula, right DLPFC, and right hippocampus (model 2), indicating higher activation in NONREM compared with HCs and REM during negative word encoding, but not during positive word encoding. We followed up these interactions with valence-specific group comparisons (within the full-factorial models). These confirmed that during negative word encoding, a main effect of group was present in the left insula (Figure 2A). Post hoc t-tests revealed that NONREM showed higher activation in this region than HCs and subthreshold higher activation than REM (pFWE=0.24), during negative word encoding. A similar pattern of insula activation was also found after excluding SSRI users (Supplementary Table S3a) and excluding the slow remitting patients from NONREM (Supplementary Table S4a). Adding illness severity at S1 (demeaned within group) or psychotherapy use at S1 as covariate did not affect the results (Z=3.85, pFWE=0.014 for IDS; Z=3.66, pFWE=0.027 for psychotherapy use).

Finally, trends of higher activation in the right hippocampus were found when comparing NONREM with REM (pFWE=0.07) and HCs (pFWE=0.079), and in the right amygdala comparing NONREM and HCs (pFWE=0.049).

Multiple regression analysis revealed that activation of the right hippocampus was positively related to time to remission during negative word encoding. Adding IDS scores or psychotherapy use at S1 to the model did not change the result (Z=3.97, pFWE=0.01 for IDS; Z=3.94, pFWE=0.01 for psychotherapy use). After excluding the SSRI users (n=24), the effect was observed subthreshold (Z=2.62, puncorrected<0.005, pFWE=0.43). To check post hoc whether SSRI use had a direct effect on hippocampal activation during negative encoding, we directly compared SSRI users with non-users. However, no effect was observed (t=0.65, p=0.52).


Recognition effects are listed in Table 2B. During successful recognition, a main effect of group was observed in the left ventrolateral PFC (VLPFC) and interactions between group and valence were found in the left insula, left amygdala, and right ACC. Planned valence-specific comparisons revealed a main effect of group during negative word recognition in the left insula, left VLPFC, and right DLPFC. Post hoc t-tests did, however, not survive corrections for multiple comparisons in these areas.

Table 2B Main Effect of Group and Interaction Between Group and Valence During Correct Recognition

No association was found between brain activation during recognition and time to remission.

Table 3 Multiple Regression Analyses with Time to Remission as Predictor Across Patient Groupsa


In this study, we investigated whether regional brain activation during emotional encoding and recognition in depressive patients was associated with subsequent course trajectory. We found that higher activation of the left insula during negative word encoding related to a non-remitting course. Groups also differed in activation of the right hippocampus and left amygdala during negative encoding, with a trend for higher activation in non-remitters compared with HCs. Moreover, higher hippocampal activation during negative word encoding was associated with delayed remission. Effects were unrelated to illness severity at baseline, although the association between time to remission and hippocampal activation was subthreshold only after exclusion of SSRI users. Memory performance or encoding behavior was not related to course. Taken together, these results indicate that insular and hippocampal activation during negative information processing may serve as neural markers of an unfavorable course in MDD.

Previously, in our study regarding the baseline fMRI measurement, we observed that elevated insula activation was associated with MDD, irrespective of current symptomatic state (van Tol et al, 2012). By including longitudinal data on the course trajectory, we could now add to this finding that insular function is indeed a neural marker of subsequent course of MDD that is not related to severity of depressive symptomatology. The insula has been associated with subjective awareness of negative feelings and the experience of visceral states following emotional events (Menon and Uddin, 2010; Singer et al, 2009). Higher activation of the left insula in non-remitters during negative word encoding might reflect increased sensitivity to negative stimuli (Surguladze et al, 2010). Such potentiation of the insula might contribute to cognitive symptoms such as difficulty in disengaging from negative information (Fu et al, 2013) and enhanced biased processing of negative information (Herwig et al, 2007), thereby hampering remission. Treatment studies have suggested that higher insular activation is predictive of poor response to treatment (Fu et al, 2013) and insular metabolism might differentiate between responders to cognitive behavioral therapy or citalopram treatment (McGrath et al, 2013). Our study suggests that higher activation of the left insula during negative information processing may also relate to the naturalistic course of MDD. Thus, insular function may affect the odds of remission, but may also affect the odds that a patient responds to a specific treatment strategy. Future studies should therefore investigate putative mechanisms that can help explain how insular activation and metabolism may contribute to an unfavorable course.

In contrast to our hypothesis, course was not associated with posterior hippocampal activation during positive encoding, but to anterior hippocampal activation during negative encoding. Given the important role of the hippocampus in memory of emotional items (Dolcos et al, 2004) and relational processing in memory encoding (Poppenk et al, 2013; Schacter and Wagner, 1999), our results suggest that MDD patients, who are likely to suffer from depressive symptom for longer time, tend to allocate more resources to encoding negative stimuli. This might suggest a ‘potentiation’ of negative information, which in turn may perpetuate course of depression and postpone remission. Moreover, the anterior hippocampus is richly interconnected with the amygdala, which has been found to facilitate memory processing of emotional stimuli by modulating hippocampal activation in healthy individuals (Disner et al, 2011). Adding to the finding of an association between small hippocampal volume and poor clinical response (Fu et al, 2013), our finding indicates that hippocampal activation during negative encoding might be a neural marker of prolonged course of depression. In addition, amygdalar activation has been found to correlate with hippocampal activation during mood-congruent memory encoding in MDD patients (Hamilton and Gotlib, 2008). Although we only found a trend of higher amygdalar activation, our results imply that hyperactivity in the anterior hippocampus might reflect an additional modulation of the amygdala during mood-congruent memory. We observed that hippocampal activation was unrelated to depressive severity, which implies that it could serve as a neural marker of depressive course independent of illness severity. Of note, associations were less strong after excluding the SSRI users from the analysis, which most likely reflects a substantial drop in power in this sensitivity analysis, as no significant effect of SSRI use on hippocampal activation was observed. We therefore conclude that the drop in significance is not a confounder of SSRI use on hippocampal activation per se, but is due to the relative drop in load scores after excluding SSRI users.

Contrary to our expectation, no differences were observed in the PFC or ACC during emotional encoding or recognition, indicating that activation of these regions during this task may be less related to clinical course. Because the lateral PFC has been found to be hyperactivated during retrieval of mood-incongruent stimuli in symptomatic phases of MDD and after recovery (Van Wingen et al, 2010), our results suggest a trait-related rather than a course-predictive role of this region in memory processing. Activation of the medial PFC and posterior cingulate cortex during encoding of negative pictures has been associated with worsening of symptoms over an 18-month period (Foland-Ross et al, 2014). However, we focused on the trajectories describing the course of MDD over 2 years and did not take change in symptom severity at a certain timepoint as our end point, which might explain the differential association with cortical midline activation. Pretreatment pregenual-ACC activation has also been identified as a marker of treatment response in depression (Fu et al, 2013; Kemp et al, 2008; Pizzagalli 2010), although most studies focused on comparisons between responders and non-responders to short-term treatment during the processing of relatively simple tasks of emotional processing or during rest (reviewed by Pizzagalli 2010). The current results, however, do not support pregenual-ACC activation during emotional memory as a potentially neural marker of naturalistic remission of depression.

Some limitations of our study should be noted. First, we calculated our results in the context of emotional word evaluation for the purpose of later recognition and could not contrast remembered vs forgotten words. Therefore, inferences on memory success with respect to forgotten words could not be drawn because of the relative lack of forgotten words (possibly caused by the short retention interval of only 10 min). Second, despite that we controlled for site effects by adding it as a covariate, different settings of data acquisition in the three sites could still confound the results, albeit a minor one, as no bias of group by site was present. Third, although patient groups did not differ in medication and psychotherapy use at the time of scanning, there could still be an effect of medication dose and frequency of treatment. Fourth, we only focused on the depressive course but did not control for the course of comorbid anxiety symptoms. Anxiety severity in patient groups, however, did not differ at baseline and follow-up measurement, suggesting this could not explain our findings. Fifth, the age-matched patient groups showed variations in their course of depression in the follow-up period, which were however not defined by the months with depression in the 5 years before S1. This could be explained by the comparable young and relative mild groups we included. Sixth, although the LCM has been reported to have high reliability and validity (Warshaw et al, 2001), this retrospective method could possibly be biased by current mood state. Finally, although we have over 20 subjects in each subgroup to get sufficient reliability (Thirion et al, 2007), subgroups were rather small and we did not correct the omnibus test for multiple comparisons. Therefore, replication of our naturalistic study is needed.

In conclusion, our results suggest that a prolonged course of depression is associated with higher activation in the left insula during negative memory processing, whereas a course with delayed remission is associated with higher hippocampal activation. Further longitudinal studies are necessary to clarify whether abnormal insular and hippocampal function change as a function of time with depression or may serve as ‘load’-independent markers of MDD course.

Funding and Disclosure

The infrastructure for the NESDA study ( is funded through the Geestkracht program of the Netherlands Organization for Health Research and Development (Zon-Mw, grant no. 10-000-1002). This study is supported by the participating universities and mental health-care organizations: VU University Medical Center Amsterdam, University Medical Center Groningen, Leiden University Medical Center, GGZ inGeest, Arkin, GGZ Rivierduinen, Lentis, GGZ Friesland, GGZ Drenthe, Scientific Institute for Quality of Healthcare (IQ healthcare), Netherlands Institute for Health Services Research (NIVEL) and Netherlands Institute of Mental Health and Addiction (Trimbos Institute). Hui Ai, Esther Opmeer, Mark van Buchem, Dick Veltman, and Marie-José van Tol declare no conflict of interest. Nic van der Wee received speaking fees from Eli Lilly and Wyeth; and served on advisory panels of Eli Lilly, Pfizer, Wyeth and Servier. André Aleman received speaker fees from Lundbeck. All of these activities are not directly related to the present study and, therefore, do not form a conflict of interest. (don't insertr this; take from acknowledgement and move to funding).