Introduction

It is well established that major depressive disorder (MDD) is accompanied by disruptions in different cognitive domains [1]. Working memory (WM) is fundamental to the performance of many cognitive tasks and day-to-day activities [2], and deficits in WM are top-ranked endophenotype candidates for recurrent MDD [3]. Neuroimaging studies that have tried to shed light on mechanisms associated with WM performance in MDD have reported conflicting results, with regard to whether MDD patients show hyper- or hypofrontality during WM. Studies in healthy subjects clearly demonstrated that the dorsolateral prefrontal cortex (DLPFC) is implicated in numerous cognitive functions relevant to WM. Activity in the anterior cingulate cortex (ACC) during WM tasks is often described in relation to increased effort, complexity, or attention [4, 5]. Since increased frontal task-related activation has been described when MDD patients show no behavioral WM impairments, it has been hypothesized that they need greater activation within the same neural network to maintain a similar level of performance as healthy control subjects (HC) [6,7,8]. In this theoretical model, hyperfrontality in MDD occurs to compensate for a lack of deactivation in regions of the default-mode network (DMN [9]). By contrast, other studies reported hypoactivation in the ACC [10], and parietal cortex [11], while observing impaired performance during WM tasks in patients. Overall, these findings might suggest that intact performance in MDD is associated with increased cortical activity, while impaired performance is associated with reduced cortical activation. In line with this view, a recent meta-analysis by Wang et al. [12] matched task performance for predominantly verbal WM demands and reported hyperactivation exclusively in the left DLPFC and hypoactivation in precuneus and insula. However, even though this meta-analysis included 11 studies and 160 patients, data were derived from 13 different WM experiments utilizing a wide range of stimuli, thereby making an interpretation of patterns of hyper- and hypoactivation in distinct brain regions rather difficult. Such difficulties are further underlined by another recent meta-analysis of 34 individual neuroimaging experiments testing cognitive processing in MDD patients that reported no significant results across studies. The authors of this meta-analysis also suggested that this finding is most likely due to differences in methods, i.e., utilized stimuli, experimental design, small and heterogeneous patient groups, and inappropriate statistical inference procedures [13].

Since impairments at the cognition–emotion interface rather than of cognitive functions per se may be most characteristic for MDD [1], it seems particularly worthwhile to investigate cognition–emotion interactions in the brain in order to broaden our understanding of WM deficits in MDD. Studies in HC demonstrated that emotional state and mood influence WM [14, 15], presumably via the activation of mood-congruent representations in WM [16], as negative mood is related to more frequent negative thoughts and to selective attention to negative stimuli [17, 18]. Consequently, negative biases in MDD, i.e., a pattern of cognition biased toward negative information and the resulting inability to reallocate attention away from negative emotional information, might prevent MDD patients from effectively completing ongoing cognitive tasks and result in WM deficits [19]. Furthermore, impairments in the ability to flexibly and efficiently update WM might prevent the removal of negative, no longer goal-relevant content from WM and thus facilitate perseverative thinking, such as rumination [20]. Accordingly, higher levels of maladaptive, depressive rumination have been associated with increasing levels of DMN dominance, i.e., increased DMN activation at rest [21]. Increased resting state activity in DMN regions is accompanied by a corresponding lack of task-induced deactivation, which has been related to cognitive deficits in MDD [9, 22].

The aim of the present study was to gain a better understanding of WM deficits in MDD, by overcoming some of the difficulties that might have caused inconsistent results in previous studies. By investigating a large sample of MDD patients and HC during a WM task that incorporated positive, negative, and neutral stimuli we aimed to probe WM processes in the context of cognition–emotion interactions in the brain. Moreover, rather than applying state-of-the-art statistical inference procedures, we applied a multivoxel pattern classification (MVPC) approach, to circumvent the main limitations of classical general linear model analysis and allow for the detection of distributed patterns of activity, while at the same time providing a solution to the multiple comparisons problem. We hypothesized that this approach would enable distinction between patients and HC based on WM-related brain activity. Our goal in this context was to design a model, which would allow to identify those brain regions, which are most informative for group discrimination. Specifically, we expected to find MDD-associated hyperactivation in WM-related regions such as the DLPFC and ACC, and a lack of deactivation in regions of the DMN. We expected these differences to be most pronounced for negative emotional stimuli and to be meaningfully associated with ruminative tendencies.

Materials and methods

Subjects

Male and female subjects with an acute MDD episode (N = 57) and matched healthy control subjects (N = 61) were recruited at the Free University of Berlin (FUB) and at the University of Zuerich (UZH) from responses to advertising in local newspapers and mailing lists. Additionally, patients were recruited at the Affective Disorders Unit at the Department of Psychiatry (UZH). Healthy subjects were screened for psychiatric disorders using the short version of the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders. Exclusion criteria for all subjects were major medical illness, history of seizures, head trauma with loss of consciousness, and pregnancy. Exclusion criteria for healthy controls furthermore included absence of present and past diagnosis of psychiatric or neurologic disease. Specific psychiatric exclusion criteria for patients consisted of atypical forms of depression, suicidal ideation, any additional psychiatric disorder, history of substance abuse or dependence, and electroconvulsive therapy in the previous 6 months. We allowed patients who were currently taking antidepressants into the study provided that the medication had not been changed during the last 4 weeks before entry into the study. The study was carried out in accordance with the latest version of the Declaration of Helsinki. Participants entered into the study after a full explanation of the purpose of the study and the study procedures and after written consent was obtained as approved by the institutional review boards.

Task and procedure

Stimuli were German nouns taken from the Berlin Affective Word List (BAWL [23]). The stimuli were classified as positive, negative, and neutral according to the BAWL norms and matched according to arousal levels, imageability, and number of letters. The stimuli were consecutively presented within a 2-back WM task, which provides an established means of both studying the interface between WM and emotion and eliciting BOLD responses in cognition- and emotion-related regions [24, 25]. Each block consisted of 15 words of either positive, negative, or neutral valence presented for 500 ms with an interstimulus interval of 1500 ms and was followed by a fixation trial (10–14 s). In total, the task consisted of 15 blocks (5 of each valence category), and a total of 225 stimuli were presented. Participants responded to the stimuli by pushing a fiber-optic light-sensitive key press.

Psychometric measures

The Beck Depression Inventory [26] was used to determine depression severity. The Response Style Questionnaire was used to measure trait-like coping styles that are not associated with state effects of depressed mood [27].

Functional magnetic resonance imaging data acquisition and analysis

Functional data were acquired on a Siemens Trio 3T (FUB), and a Philips Achieva 3T scanner (UZH) using standard echo planar imaging sequences [24, 25], and preprocessed in SPM12 (Wellcome Trust Centre for Neuroimaging, London, UK) using standard parameters (for details see Supplementary Materials and Methods). A fixed-effect model at a single-subject level was performed to create images of parameter estimates. For each subject, the following contrast images of parameter estimates were calculated: (1) all WM conditions versus fixation condition (WM > fixation); (2) positive WM condition versus fixation condition (Pos > fixation); (3) negative WM condition versus fixation condition (Neg > fixation); (4) neutral WM condition versus fixation condition (Neu > fixation); (5) emotional WM conditions versus neutral WM condition (Emo > neutral).

Multivoxel pattern classification

MVPC was used to discriminate patients from controls and to identify group associated patterns of activity. The above-mentioned contrast images were taken as input for the classifier. Support vector machines (SVM) with a linear kernel were employed for classification [28]. In order to determine the optimal model, SVM with three different feature selection strategies were tested and compared: SVM without feature selection, SVM with recursive feature elimination [29], and SVM with feature selection based on ranking using F-score values (SVM-fScore [30]).

The out-of-sample performance of the classifier was evaluated via a leave one out cross validation (LOOCV). The choice of LOOCV over larger folds was motivated by the relatively small number of samples available. Indeed, LOOCV is a common approach in neuroimaging studies, because of the limited sample sizes compared to machine learning applications in other domains. The classification weight maps for subsequent analyses were constructed by averaging the weights over all folds of the cross validation. The use of linear separation boundaries allowed a straightforward interpretation of the feature weights, implying that higher absolute weights corresponded to the most discriminative features. Due to the applied label convention, a positive weight sign indicated higher values in patients, and a negative weight sign indicated higher activation in controls. The performance of the classifier was evaluated in terms of accuracy, sensitivity, and specificity. All classification analyses were performed in Python (Python 2.7.10) and the Nilearn library (v0.2.6). Permutation tests with 1000 repetitions were used to assess the statistical significance of the obtained classification accuracies [31]. For control purposes, a 10-fold cross-validation approach to evaluate the out-of-sample performance of the classifier was implemented. Moreover, additional classification analyses on subject subgroups from the two scanning sites was performed to assess the robustness of the results. More details of the classification procedures are described in the Supplementary Materials and Methods.

Region of interest analyses

To further investigate potential differences between valence conditions of the WM task, post hoc region of interest (ROI) analyses were conducted on the SVM-fScore weight map of the WM > fixation contrast. To set the focus on the most relevant regions, 20% of the highest classification weights were kept, while the remaining 80% were masked. Additionally, a cluster threshold of 50 voxels was set to focus on the physiologically meaningful clusters.

Two types of mixed model analyses of variance (ANOVAs), both with group (MDD versus HC) as between-subject factor, were calculated on selected ROI data. In the first ANOVA, the within-subject factor condition was based on the contrast of parameter estimates (COPE) of the three WM conditions (Pos > fixation, Neg > fixation, and Neu > fixation). Since aberrant brain activity during rest has been reported in MDD [32], a second ANOVA including the fixation condition was conducted on the raw parameter estimates (RPE) of all conditions (fixation, positive, negative, and neutral). To account for multiple testing due to several parallel ANOVAs and post hoc t-tests, alpha levels were adjusted using Bonferroni correction.

Results

Sample and WM performance

Patients and controls did not differ with regard to age and sex. Clinical, demographic, and behavioral data are summarized in Table 1. WM accuracy and reaction times were analyzed using two-way repeated measures ANOVAs with the factors condition (positive, negative, and neutral) and group (MDD and HC). There were no significant effects of group, condition and no significant interaction effects of these factors on WM accuracy and reaction times. However, results showed a trend for a significant effect of group on reaction times (F1,115 = 3.544, p = 0.062), with Bonferroni adjusted post hoc t-tests revealing marginally significantly slower reaction times in depressive patients for negative words (t115 = −2.27, p = 0.025). Behavioral data of one subject could not be analyzed because of a corrupted log-file. A negative correlation was found between WM accuracy and rumination (Pearson’s coefficient, r46 = −0.35, p = 0.017).

Table 1 Demographic, clinical, and behavioral data

Functional magnetic resonance imaging: classification results

The MVPC method revealed significant classification accuracies for all WM versus fixation contrasts as indicated by the results of the permutation tests. Highest classification accuracy was observed for the Neu > fixation contrast (73.7%) and for the Neg > fixation contrast (71.2%). Lower but still significant accuracies were obtained for the WM > fixation contrast (66.1%) and for the Pos > fixation contrast (63.6%), while no significant predictions were found for the Emo > Neu contrast (49.2%). Highest classification accuracies were obtained for the SVM-fScore method. All classification results for this classifier are provided in Table 2, and a comparison of the results of all three feature selection methods is shown in Supplementary Table S1. The 10-fold cross-validation results showed comparable performances to LOO, as reported in Supplementary Table S2. The classification results investigating the site effect showed that combining samples from the two sites provided either comparable or improved performances than single-site approaches. More information is given in the Supplementary Materials and Methods.

Table 2 Results of the SVM-fScore classification on the contrasts of interest

Functional magnetic resonance imaging: ROI results

The cluster extraction from the weight image of the WM > fixation contrast revealed 14 ROIs that met the cluster inclusion criteria. Seven of the ROIs were based on positive weights depicting higher activity in MDD compared to HC, and seven ROIs were based on negative weights depicting higher activity in HC compared to MDD (Table 3 and Fig. 1). The contribution of the extracted ROIs based on average cluster weights is shown in Supplementary Figure S1.

Table 3 ROIs extracted from weight map of the WM > fixation contrast
Fig. 1
figure 1

SVM weight map. The location of the most relevant SVM classification weights from the WM > fixation contrast are shown (20% of the highest weights with a cluster threshold of 50 voxels). Red regions depict more activation in MDD patients. Blue regions depict more activation in healthy controls

The COPE-based ANOVA in the left DLPFC showed a main effect of group that was driven by stronger activation in MDD patients compared to HC (F1,116 = 9.38, p < 0.01). This finding was confirmed by the ANOVA based on the RPE, which showed a main effect of group (F1,116 = 12.32, p < 0.001), condition (F3,116 = 44.121, p < 0.001), and an interaction effect (F3,116 = 7.49, p < 0.001). Paired comparisons showed that effects were driven by stronger differences (MDD > HC) in the three WM conditions compared to the fixation condition (Fig. 2a, b).

Fig. 2
figure 2

ROI results. a, b Results from the left DLPFC ROI. c, d Results from the dorsal ACC ROI. e, f Results from the PCC ROI. g, h Results from the right IPL ROI. i, j Results from the left STG/insula ROI. The left column shows the results for the contrast of parameter estimates. The right column shows the results for the raw parameter estimates. Asterisks depict significant differences (t-statistic) between MDD patients and healthy controls (HC): ***p < 0.001; **p < 0.01; *p < 0.05

The ANOVA based on the COPE in the dorsal ACC (dACC) ROI revealed a main effect of group driven by stronger activation in MDD patients compared to HC (F1,116 = 8.77, p < 0.01). This finding was confirmed by the ANOVA based on the RPE, which showed a main effect of group (F1,116 = 10.11, p < 0.01), condition (F3,116 = 123.69, p < 0.001), and an interaction effect (F3,116 = 6.18, p < 0.001). Paired comparisons showed that effects were driven by stronger differences (MDD > HC) in the three WM conditions compared to the fixation condition (Fig. 2c, d).

In the posterior cingulated cortex (PCC) the ANOVA based on the COPE revealed a marginally significant main effect of group that was driven by stronger deactivation in patients compared to controls (F1,116 = 5.35, p < 0.05). Notably, this finding was not confirmed by the ANOVA based on RPE. Here a main effect of condition (F3,116 = 144.47, p < 0.001), and an interaction effect (F1,116 = 4.31, p < 0.01) were observed. Post hoc comparisons showed that effects were driven by stronger PCC activity in patients during the fixation condition (Fig. 2e, f).

In the right inferior parietal lobe (IPL), the COPE-based ANOVA showed a main effect of group (F1,116 = 9.95, p < 0.01) that was driven by relative deactivation in patients and activation in controls. The ANOVA based on RPE showed a significant interaction effect (F3,116 = 6.25, p < 0.001). Although paired comparison did not reveal any significant differences, descriptive statistics suggest larger differences in the fixation condition compared to the WM conditions (Fig. 2g, h).

In the left STG/insula the COPE-based ANOVA showed a main effect of group (F1,116 = 8.76, p < 0.01) that was driven by relative deactivation in patients and activation in controls. The ANOVA based on RPE showed a significant interaction effect (F1,116 = 5.51, p < 0.01). Post hoc comparisons showed that patients had marginally higher activity during the fixation condition, while no difference was observed during the WM conditions (Fig. 2i, j).

Using Pearson’s correlation coefficient a significant negative correlation was found between dACC activation and reaction times for negative stimuli (r118 = −0.22, p = 0.019). Furthermore, significant positive correlations were found between rumination (reflection subscale) and activation of the right IPL (r46 = 0.4, p = 0.006) and dACC (r46 = 0.33, p = 0.027).

Discussion

Based on multivariate pattern classification, this study demonstrated that depressed patients can be distinguished from healthy controls with good classification accuracy and sensitivity based on functional activation patterns during an emotional WM task. In particular, our prediction results outperform both a majority class and random chance prediction. Thus, our results are in line with a recent meta-analysis showing that MDD patients can be distinguished from healthy controls using different magnetic resonance imaging-based modalities [33]. Moreover, the majority of functionally aberrant regions with discriminative power were located in the DMN, regions involved in cognitive control, and the DLPFC. Highest classification accuracies were achieved for neutral and negative stimuli. However, differential activations between neutral and emotional stimuli did not reveal significant classification results.

In left DLPFC, MDD patients showed higher BOLD responses during task conditions compared to HC. This finding is in accordance with results of a recently published meta-analysis of WM-related brain activity in MDD, which reported hyperactivation exclusively in the left DLPFC [12]. Importantly, even though this meta-analysis included data from 13 different WM experiments and therefore a wide range of stimuli, left DLPFC hyperactivation in MDD remained significant when task performance for predominantly verbal WM demands was matched to that of HC. Since our data do not indicate impaired task performance in MDD, our finding of left DLPFC hyperactivation would rather support the hypothesis that hyperfrontality during WM tasks reflects the need for greater activation to maintain a similar level of performance as HC [6,7,8]. With regard to cognition–emotion interaction, the DLPFC plays a major role in executive control processes, i.e., directing attention away from task-irrelevant emotional distractors during WM [34], which is a key sub-process implicated in effortful voluntary emotion regulation [35]. Greater recruitment of the DLPFC might therefore reflect increased resources to perform the WM task, while inhibiting the allocation of attention toward the processing of emotional stimuli. Interestingly, our data revealed that only left, but not right DLPFC was hyperactive in MDD patients during the WM task. However, the WM task utilized in our study included only verbal stimuli, which are predominantly processed in the left hemisphere [2]. While numerous behavioral studies have shown biases toward negative emotional stimuli in MDD [19, 36] and accordingly reduced activity in left DLFPC for positive stimuli [37], others reported no differences in task performance and functional activity [6, 7]. The MVPC and ROI results presented here also argue against valence-specific effects with regard to left DLPFC activation. Rather, hyperfrontality in MDD might occur to compensate for a lack of deactivation in regions of the DMN [9, 22], in order to keep an effective functional loop between the respective regions and to maintain behavioral performance. Consistently, our data show stronger PCC activation in MDD during rest and diminished deactivation during task conditions. PCC is an important hub in the DMN and characterized by increased activity at rest and deactivations during various emotional–cognitive tasks [9, 38]. It might also play a direct role in regulating the focus of attention by controlling the balance between internally and externally focused thought [39]. In the healthy brain, a failure of appropriate deactivation is associated with inefficient cognitive function [40]. It has been suggested that a failure to suppress PCC activity might reflect the intrusion of internal mentation into task performance [41]. Accordingly, increased PCC activation at rest and decreased deactivation during emotional and cognitive tasks have been reported in MDD [9, 42] and might indicate a generally limited potential for adaptive adjustment of this region in MDD patients [43]. It has to be noted that based on the COPE (WM > fixation) patients showed stronger deactivation in the PCC. This result was reversed when looking at the RPE. This apparent contradiction is based on large resting state PCC activation in patients that leads to relatively stronger deactivation during the task conditions.

MDD patients showed higher BOLD responses than HC during the WM task in dACC extending to the supplementary motor area (SMA). The cognitive subdivision of dACC shows strong connections with DLPFC regions, SMA, and parietal cortex, and has been implicated in response selection and processing of cognitively demanding information. Activity in this region during WM tasks is often described in relation to increased effort, complexity, or attention [4, 5]. Higher activation in MDD patients regardless of valence as observed here might assure intact WM performance. On the other hand, abnormal dACC functioning has also been associated with biased attention to negative stimuli and rumination [44]. Accordingly, our association analysis revealed a significant correlation between rumination and dACC activation.

In right IPL, MDD patients showed higher activation at rest, but diminished BOLD responses during task conditions compared to HC. This region is relevant for visual-spatial processing, usually recruited during n-back tasks and an intermediate node between cognitive control and default-mode networks [45]. Decreased BOLD responses in IPL may be reflective of inadequate communication between these networks, such that larger areas of local cortex need to be recruited in order to shift internal resources from internal (i.e., DMN-related) to external (i.e., cognitive control) functions during WM [46]. While results showed decreased BOLD responses regardless of emotional valence, our data nevertheless suggest that these might be associated with rumination and result in higher reaction times for negative and neutral stimuli, possibly because rumination disrupts allocation of cognitive resources and increases recall of negative life events [47].

A cluster differentiating MDD from HC extended from STG to anterior insula, which is viewed as an interface of cognitive, affective, and homeostatic mechanisms, and is suggested to represent an integral structure for stimulus-driven processing and monitoring of the internal milieu [48]. Previous work by Gu et al. [49] suggested that anterior insula is incorporated in a network integrating cognition and emotion [15]. Within this network, anterior insula represents interoceptive changes of unique relevance to subjective experience, whereas control regions, such as DLPFC, maintain online representations of cognitive demand and stimulus features as well as goal-directed implementation [50], all of which are operations required in cognition–emotion integration. When comparing rest and task within the MDD group, our data show higher activation during rest than during task, which is consistent with previous findings [6, 12, 43] and might indicate increased interoceptive awareness or salience of internal stimuli, while salience of external stimuli is diminished, thereby impairing cognitive processing. This idea is supported by previous findings by Delaveau et al. [51], showing that symptom reduction induced by antidepressant medication increases insula activation during task-related conditions.

Our findings regarding the effect of emotional content on WM performance in HC are in accordance with previous results from several studies that found no impact of emotion on WM performance [24, 25, 52]. In MDD patients, however, we found slower reaction times for negative stimuli, while WM accuracy did not differ between MDD patients and HC. One could hypothesize that reaction times may be more sensitive to small modulations by emotional content than accuracy and therefore emotional content may be more likely to modulate the efficiency with which information is processed as compared with the accuracy with which it is held online. The recruitment of neural networks implicated in emotion processing might result in additional inputs to the WM system [53]. Therefore, it may be that many additional facets of information must be inhibited to allow for processing of only the task-relevant information in the context of the WM task. This increased demand on inhibition may slow the response times in MDD patients, which would be especially true for negative stimuli. The failure of MDD patients to inhibit or discard mood-congruent negative information might increase rumination, and thereby underlie cognitive slowness and attentional deficits [54]. This is also supported by our findings of increased reaction times for negative stimuli in MDD patients and the association between rumination and WM accuracy.

Although the present study overcomes some of the crucial shortcomings of previous reports with respect to sample size and applied statistics, some limitations should be acknowledged. Probably, the most important limitation is that MDD is a very heterogeneous disease and different subtypes might result in different effects on cognitive processes. Our misclassification rate of ~30% may be due in part to this heterogeneity. Further research may build on the current classification results to investigate disease subtypes and relevance to treatment–response prediction. Furthermore, some of the MDD patients (N = 22) took different types of antidepressant medication during the study, which might have posed an additional source of variance. Although the applied MVPC approach inherently takes into account confounding factors and noise in the data, it has to be noted that this study did not focus on the effects of antidepressant medication. We performed control analyses (data not shown) between medicated and unmedicated patients that did not reveal significant group differences, but medication types were considered too diverse to draw definite conclusions from these analyses. Upcoming studies should investigate the effects of specific antidepressants on brain activations during WM. From a methodological point of view, it should be noted that the classification model was evaluated on a single dataset only. Therefore dataset-specific effects might have influenced the results. The validation of the current findings in a different sample is the next step, which is considered as future work.

To conclude, by applying MVPC, the present study demonstrates that functional activation patterns during an emotional WM task can be used to distinguish MDD patients from controls with good accuracy and sensitivity. While adequate WM performance in MDD is associated with frontal hyperactivation, patients show a lack of deactivation in regions of the DMN. This effect is most pronounced for negative and neutral stimuli and associated with rumination, suggesting an important role of aberrations in WM processing for cognition–emotion interactions in MDD.