Introduction

Substance use disorders like alcohol dependence (AD) are regarded to be chronic, relapsing brain diseases.1, 2, 3 Although initial drug intake is associated with positive reinforcement and impulsive actions, chronic drug use is characterized by compulsive drug-taking behavior despite negative consequences.4, 5, 6 Animal research suggests that this shift from casual to compulsive drug use is based on increased habit formation at the expense of flexible, goal-directed action control.7, 8, 9, 10 Yet, little is known about the mechanisms underlying this process in human drug abuse, and experimental evidence regarding the neuro-adaptations that develop during chronic alcohol use in humans is lacking.

According to dual-systems accounts, two dissociable learning processes can be identified in instrumental behavior.11,12 On one hand, goal-directed actions are performed in order to achieve desirable goals (positive reinforcement) or to avoid undesirable outcomes (negative reinforcement). On the other hand, habitual responses are directly triggered by cues in the environment even when the outcomes have lost their goal value.13 Studies in rodents have shown that the dorsomedial striatum and prelimbic cortex are involved in goal-directed actions.13, 14, 15, 16, 17 Imaging studies in healthy human volunteers indicate that goal-directed control is associated with activation of the ventromedial prefrontal cortex (VMPFC),18,19 an equivalent of the animal prelimbic cortex, and by the caudate head.20 In contrast, habit learning in rats is mediated by the dorsolateral striatum,21,22 whereas in humans it mainly involves the posterior putamen.23 A recent diffusion tensor imaging study in healthy volunteers24 showed that white matter connectivity within corticostriatal pathways is predictive of individual differences in the balance between goal-directed and habitual control: estimated tract strength between caudate and VMPFC predicted goal-directed action, and white matter tract strength between posterior putamen and premotor cortex predicted habitual behavioral control.

Goal-directed actions are flexible but also slow; whereas, habitual responses are inflexible but have the advantage of being quick and automatic. Therefore, habit formation is a necessary and adaptive process. However, a gradual shift from goal-directed towards habitual control, mediated by phasic dopamine release in dorsolateral parts of the striatum,25 has been hypothesized to underlie chronic compulsive drug-seeking in substance use disorders like AD (for reviews, see refs. 26,27). Indeed, long-term drug seeking in rodents is insensitive to decreasing outcome values, known as outcome-devaluation,28,29 suggesting habitual control. Lesion and microdialysis studies show that the formation of drug habits is associated with a shift away from engagement of the prelimbic cortex30 and dorsomedial striatum31 towards more dorsolateral parts of the striatum9 with prolonged drug use.32 A dopamine-dependent cascading loop33 from ventral to more dorsal parts of the striatum21 is implicated in this course of habit formation. Reactivation of the hypoactivated prefrontal cortex in cocaine-dependent rats restores control over cocaine-seeking.30 Inactivation of the dorsolateral striatum on the other hand reverses habitual drug seeking in rats that were exposed to prolonged alcohol31 or cocaine use.32 Thus, evidence suggests increased involvement of the dorsolateral striatum in rodents31,34, 35, 36 and non-human primates37 in long-term drug seeking.

The animal studies reviewed above suggest that a progressive imbalance between goal-directed and habitual control is also likely to play a role in the development of compulsive drug habits in human substance abusers. To investigate this possibility, we examined the balance between goal-directed and habit learning, and its neural correlates, in abstinent AD patients and healthy controls (HC) during functional magnetic resonance imaging (fMRI) scanning, using an instrumental learning task developed to distinguish between goal-directed and habit-based learning.19 In addition, we examined whether this imbalance is dependent on disorder duration to test the theory that initial goal-directed actions are gradually replaced by stimulus-driven habitual behavior. We hypothesized stronger reliance on habits at the expense of goal-directed control in AD compared to HC, reflected in a relatively strong engagement of the neural habit pathway, comprising dorsolateral/posterior parts of the striatum (posterior putamen, caudate tail/body) and a relatively weak engagement of the goal-directed pathway in the VMPFC and dorsomedial/anterior parts of the striatum (caudate head, anterior putamen) during instrumental learning. Furthermore, this imbalance in behavior and its neural correlates was hypothesized to be more pronounced in longstanding as opposed to more recent-onset AD.

Materials and methods

Participants

To facilitate a heterogeneous sample of AD participants with regard to the duration of the disorder, AD patients were recruited from two sources. One half of the group was recruited from addiction treatment clinics; the other AD patients and HC were recruited from the Netherlands Study of Depression and Anxiety (NESDA),38 a large multi-site naturalistic cohort study with participants from primary care and outpatient mental health services.

Forty-two patients were included with a current (<6 months) DSM-IV diagnosis of AD and no comorbid lifetime Axis-I diagnosis other than major depressive or anxiety disorders according to the Composite International Diagnostic Interview (CIDI).39 In addition, a group of 20 HC was selected without any lifetime Axis-I disorder. All participants were free of major internal or neurologic disorders and MRI contraindications, did not use psychotropic medication other than a stable use of selective serotonin reuptake inhibitors or infrequent benzodiazepine use, and were free of current substance use disorders (other than alcohol for the AD group and smoking). All participants were asked to abstain from alcohol at least 24 h and from caffeine a few hours prior to the assessments. All participants had a confirmed breath alcohol level of 0.00% (Alcoscan Daisy-AL7000, Sentech, Korea). No participants scored higher than 8 points on the withdrawal symptom-checklist CIWA-Ar,40 and were therefore considered withdrawal-free. Mean abstinence duration of AD patients was 12 days (see Table 1).

Table 1 Sample characteristics

We excluded participants who: (1) had low quality imaging data (1 HC, 2 AD); (2) responded on less than 90% of the trials during the instrumental learning task (2 AD); (3) showed performance below chance level during the training phase of the instrumental learning task (1 AD); and (4) urine-tested positive for drugs (cocaine) or benzodiazepines directly prior to the assessments (6 AD). The final dataset included 31 AD and 19 HC for analyses.

Task paradigm

A detailed task description can be found in the Supplementary Methods. Briefly, the instrumental learning task was developed to distinguish between goal-directed and habit-based learning,19 and has been used successfully in studies to establish habits in both animals and humans.24,41, 42, 43, 44 The task consists of a discrimination training phase and an outcome-devaluation test phase.

During the discrimination training phase, participants learn by trial-and-error to respond (R) with a left or right button press to stimuli (S) in order to gain outcomes (O) that yield points representing monetary reward. Three trial types with different S–O contingencies can be distinguished: standard, congruent and incongruent (see Figure 1). A core feature of the task is the differential involvement of goal-directed and habit learning systems during the different trial types. During standard trials different outcome-pictures follow the stimulus-pictures after a correct response, whereas during congruent trials the outcome-pictures are identical to the preceding stimulus-picture. Learning the correct response to each stimulus during these two trial types can be established by using either the goal-directed S–O–R system or the habitual S–R system. Conversely, during incongruent trials, stimulus pictures in one trial are shown as outcomes in other trials and vice versa, that is, each picture functions as a stimulus and an outcome for opposing responses. In these trials, goal-directed learning is disadvantageous, and performance relies solely on habitual control via direct S–R associations. Indeed, previous studies have demonstrated that both humans and animals adopt an S–R habitual learning strategy to solve this incongruent trial type.19,41, 42, 43, 44 Contrasting blood oxygen level-dependent (BOLD)-signal of the different trial types during the discrimination training phase allowed us to study neural correlates of the balance between the goal-directed and habit learning systems during instrumental learning (for the contrasts, see section Imaging Data Acquisition and Analysis).

Figure 1
figure 1

Three trial types with stimulus–response–outcome associations (left panel) and their involved learning systems (right panel). The standard and congruent discriminations can be resolved with both the goal-directed and the habit system. In contrast, the incongruent discrimination can only be resolved using the habit system. During instrumental learning in the flexible goal-directed system, the outcome is represented in the associative structure, allowing incentive evaluation of the outcome to impact on action selection. In the habit system, behavior is directly triggered by environmental stimuli via S–R associations, rendering action selection relatively efficient but also inflexible.

Following the discrimination training phase, an outcome-devaluation test phase assesses the strength of goal-directed R–O associations. Here, some of the outcomes are devalued and participants have to use their knowledge of the response–outcome (R–O) relationships to (re)direct their choices towards still-valuable outcomes. Individuals who merely acquired knowledge about habitual S–R associations during the training phase have more difficulties to accurately complete the test phase than individuals whose instrumental learning was mainly based on goal-directed S–O–R associations.

During the discrimination training phase and the outcome-devaluation test phase, we assessed the total percentage correct responses (task accuracy) and response times for each trial type separately. Learning effects over time were measured across the six equal blocks of the training phase. The task was assessed two times, once with neutral (fruit) pictures, and once with (simple) alcohol pictures (in counterbalanced order). However, picture-type did not yield any differences in behavioral or neuroimaging data. Therefore we took the two sessions together for the final analyses, in order to increase power.

Statistical analyses

Sample characteristics and task performance data were analyzed using SPSS 16.0. After testing for normality using Shapiro–Wilk’s test, we used t-tests and whenever necessary non-parametric Mann–Whitney U or χ2 tests to examine group differences. Correlation analyses were performed using Spearman’s ρ.

Task accuracy and response times of the discrimination training phase and outcome-devaluation test phase were entered in a series of repeated measures ANOVAs with trial type (standard/congruent/incongruent) and (for the training phase) block as within-subject factors, and group as between-subjects factor. Post-hoc tests of significant main effects were analyzed using paired t-tests with Bonferroni correction for multiple comparisons. To test whether initial goal-directed control is gradually replaced by stimulus-driven habitual control in chronic AD, we assessed the association between AD duration and task performance in regression analyses within the AD group.

All SPSS analyses were performed two-tailed with an alpha of 0.05, and all repeated-measure factors are reported with Greenhouse–Geisser sphericity corrections (Pgg).

Imaging data acquisition and analysis

Functional magnetic resonance imaging was performed at the Academic Medical Centre in Amsterdam using a 3T Philips Intera full-body MR-system (Philips Medical Systems, Best, The Netherlands) with a phased array SENSE RF 8-channel head coil. Functional BOLD signals were sequentially acquired with a T2*-weighted, gradient-echo planar imaging sequence and an estimated time-course series of approximately 360 volumes (self-paced) per session (TR/TE=2300 ms/20 ms; matrix size=96 × 95; voxel size=2.29 × 2.29 × 2.50 mm3; slices=45). Imaging parameters were optimized according to the methods of Deichmann and colleagues45 to minimize susceptibility and distortion artifacts in the orbitofrontal cortex: each volume was scanned with an orientation of 30° from Anterior-Posterior Commissure line. A T1-weighted structural scan was made for anatomical reference with the fMRI data (TR/TE=9 ms/3.6 ms; matrix size=256 × 231; voxel size=1 × 1 × 1 mm3; slices=170).

Imaging analyses were performed using SPM8 (Wellcome Trust Centre for Neuroimaging, London, UK). Images were manually reoriented to the Anterior-Posterior Commissure line, slicetimed, realigned, warped to Montreal Neurological Institute (MNI) space, and smoothed using a Gaussian kernel of 8 mm full-width-at-half-maximum. To remove low-frequency signal drift a high-pass filter (128 Hz) was applied.

Statistical analysis of imaging data was performed using the General Linear Model.46 Trial onsets were modeled with a delta function convolved with a synthetic hemodynamic response function and modulated for stimulus length (2000 ms). Obtained parameter estimates of the three trial types from the discrimination training phase were entered in a grouptrial type full-factorial second-level (random effects) mixed-ANOVA model.

During second-level analyses, we first assessed instrumental learning irrespective of the involvement of goal-directed or habit system, by examining BOLD-responses during all three trial types (standard, congruent and incongruent) contrasted against an implicit baseline. Second, to specifically assess the neural correlates of goal-directed learning, we contrasted the standard and congruent trials (both involving S–O–R goal-directed and S–R habit learning) against the incongruent trials (only involving S–R habit learning) according to the contrast: ((standard+congruent)>incongruent). Third, since the incongruent trials rely on S–R learning only, we compared the groups on these trial types in order to examine the involvement of brain pathways associated with S–R habit learning. To investigate the neural associations of instrumental learning with duration of AD, first level parameter estimates for these abovementioned three contrasts of interest were entered into separate second-level regression analyses with duration of AD as predictor variable within the AD group.

All imaging analyses were performed whole-brain, but we focused on areas we found in the analyses that are known to be involved in goal-directed and habit learning:18, 19, 20,23,24,47 VMPFC, anterior putamen and caudate head for goal-directed learning, and posterior putamen and caudate tail/body for habit learning. Anterior versus posterior putamen was demarcated at MNI-coordinate y=2 (anterior>2; posterior<2), in accordance with a previous connectivity study using the same task.24 The mask for the VMPFC and striatum (caudate and putamen) was derived from the automatic anatomical labelling atlas incorporated in the WFU PickAtlas Tool v2.5.248 (Figure 2).

Figure 2
figure 2

Masks used for areas of interest, derived from the automatic anatomical labelling atlas in WFU Pickatlas. Yellow: ‘frontal_sup_medial’ & ‘frontal_med_orb’ covering the medial prefrontal cortex, and a defined delineation at z<25 for ventral orientation. Red: caudate & putamen, covering the striatum.

Main effects are reported at significance P<0.05 whole-brain family-wise error (FWE) corrected (PFWE) for multiple comparisons. Group comparisons and regression analyses with duration of AD were examined at P<0.005 uncorrected (Puncorr.) with a cluster-extent of 5 voxels to reduce the risk of Type-II error. To further protect against Type-I error, group comparisons were corrected for multiple comparisons on the voxel-level by using FWE-correction as implemented in SPM within the masks of interest, using a small-volume correction49 by centering a 10 mm-radius sphere around peak coordinates of activated clusters, as corresponding to previous studies.19,50,51 The resulting volumes of interest had to meet P<0.05 FWE small-volume corrected (PFWE_SVC), to be considered significant. 

Results

Sample characteristics

For detailed information on sample characteristics, see Table 1. Briefly, groups were matched on age, gender, education, handedness and experienced stress. The AD group contained smokers, whereas the HC group did not, and scored higher on alcohol use as well as depression and anxiety severity compared to the HCs.

Task performance

Discrimination training

Overall, participants showed the expected learning effect (F4,107=103.695; Pgg<0.001) and learned to respond on all trial types above chance level during the final block (all P<0.005). In line with previous studies a main effect of trial type was observed (F1,961=8.155; Pgg<0.005): performance on incongruent trials was inferior compared to congruent trials (Pbonf=0.001) and marginally inferior compared to standard trials (Pbonf=0.072), whereas performance on congruent and standard trials did not significantly differ (Pbonf=0.221). For a plot depicting learning during discrimination training, the reader is referred to Supplementary Figure S2. No differences in performance between AD and HC were observed, indicating that exposure to the S–R–O contingencies was similar in both groups.

Response times became faster as a consequence of training (F3,178=62.41; Pgg<0.005). A main effect of trial type (F1,923=24.57; Pgg<0.005) on response times was found with longer response times for the incongruent compared to the congruent and standard trials, and for the standard trials compared to the congruent trials. There was no speed–accuracy trade-off.

Outcome-devaluation test

During the outcome-devaluation test phase, a main effect of trial type was found (F1,63=37.412; P<0.005): performance on the incongruent trials was inferior to the other two trial types (Pbonf<0.005 for both comparisons), and performance on the standard trials was inferior to the congruent trials (Pbonf=0.005). Performance was significantly above chance level on congruent and standard trials (for both groups) (all P<0.001), but not on incongruent trials, indicating that discriminative actions on incongruent trials were exclusively mediated by habitual S–R associations, whereas actions on standard and congruent trials additionally benefited from the goal-directed system encoding the R–O relationships.

Importantly, a significant main effect for group regardless of trial type was found for outcome-devaluation (F1,48=6.07; P=0.017) with poor performance in the AD group compared to HCs, indicating impaired R–O knowledge for goal-directed action in AD. For a graphic representation of these findings the reader is referred to Supplementary Figure S3.

Imaging results

Instrumental learning (All trial types)

This analysis revealed that the posterior putamen (Z=6.38; PFWE<0.05) and dorsal caudate nucleus (Z=5.68; PFWE<0.05) were activated during instrumental learning (main effect). Group comparisons showed that during instrumental learning, the posterior putamen was more active in AD than in HC (Z=3.22; PFWE-SVC=0.044). Furthermore, although the main effect did not reveal VMPFC involvement, HC did activate the VMPFC more than AD (Z=3.67; PFWE-SVC=0.012) (Figure 3-I, panels b and a, respectively).

Figure 3
figure 3

Group comparisons of BOLD response during the instrumental learning task. Displayed at P<0.005 whole-brain uncorrected, extent threshold >5 voxels: (I) Group comparisons during instrumental learning in general; (a) larger involvement of the VMPFC (Z=3.67) in HC compared to AD; (b) larger involvement of the posterior putamen (Z=3.22) in AD compared to HC. (II) Group comparisons during goal-directed learning: larger involvement of the VMPFC (Z=3.45) and anterior putamen (Z=3.63) in HC compared to AD. (III) Group comparisons during S–R habit learning: larger involvement of the posterior putamen (Z=3.37) in AD compared to HC. Abbreviations: AD, alcohol dependent; BOLD, blood oxygen level-dependent; HC, healthy controls; VMPFC, ventromedial prefrontal cortex; Z, standard score Z-value.

Goal-directed learning [(Standard+Congruent)>Incongruent]

The main effect of goal-directed learning showed VMPFC involvement (Z>6; PFWE<0.05). Moreover, this activation of the VMPFC was significantly more pronounced in HC than in AD (Z=3.45; PFWE-SVC=0.023). In addition, HCs showed significantly stronger activation of the anterior putamen compared to AD (Z=3.63; PFWE-SVC=0.013) (Figure 3-II). The AD group showed no regions with greater activation compared to the HCs.

Habit learning (Incongruent trials)

The main effect of S-R habit learning showed activation of the posterior putamen (Z=7.48;PFWE<0.05) and the dorsal caudate nucleus (Z=4.95; PFWE<0.05). There was more activation in the posterior putamen (Z=3.37; PFWE-SVC=0.029) in AD compared to HC (Figure 3-III). The HCs did not show increased activation in the areas of interest compared to AD.

Post-hoc

Since AD and HC significantly differed in smoking and depression/anxiety symptoms, we added these variables as covariates in a post-hoc mixed-ANOVA model for behavioral data as well as BOLD response. The reported group differences remained significant.

Associations with AD duration

AD duration was not correlated with task performance in the discrimination training phase or the outcome-devaluation phase. Regression analysis of AD duration with brain activity during the goal-directed learning contrast [(Standard+Congruent)>Incongruent] showed that the VMPFC was less active in AD patients with a longer duration of AD (Z=2.97; Puncorr.<0.005) (Figure 4). After correcting for multiple comparisons, this cluster became marginally significant (ρ=−0.569; PFWE-SVC=0.107). Since AD duration was correlated with age (ρ=0.486; P<0.01) we post-hoc performed multiple regression and partial correlation analyses in SPM, revealing that VMPFC activation was inversely correlated with AD duration, but not with age (data not shown).

Figure 4
figure 4

Regression of VMPFC activity with duration of alcohol dependence. Displayed at P<0.005 whole brain uncorrected, extent threshold >5 voxels. Scatterplot shows a negative correlation between parameter estimates in the VMPFC and duration of AD. Correlation coefficient is meant for visualization purposes only. AD, alcohol dependent; VMPFC, ventromedial prefrontal cortex.

Discussion

To the best of our knowledge, this is the first study providing behavioral as well as neurophysiological evidence for an imbalance between goal-directed and habitual control in humans with a substance use disorder. We demonstrate that AD affects the ability to base instrumental actions on goal-directed R–O knowledge. This effect was not due to differences in exposure to the S–R–O contingencies, since AD patients and HCs performed equally well during a training phase. These results are therefore in line with the hypothesis of impaired goal-directed action and subsequent overreliance on S–R based habit learning in AD. Furthermore, we demonstrate that this imbalance is associated with decreased activity in the VMPFC and anterior putamen, brain regions previously implicated in goal-directed learning,18, 19, 20 and increased activity in the posterior putamen, a region involved in habit learning.23

A similar imbalance between goal-directed and habitual control has previously been shown to underlie several other psychopathological conditions associated with alterations in dopaminergic transmission. For example, patients with Parkinson’s disease show an imbalance between goal-directed and habit learning.42 The involvement of dopamine in goal-directed action is further supported by a study using an experimental dopamine depletion paradigm.43 Moreover, patients with an obsessive-compulsive disorder exhibit a similar imbalance,44 suggesting a shared mechanism across different forms of compulsive behavior.52 Finally, impulsivity, a trait repeatedly found in substance use disorders, is associated with excessive reliance on S–R habits at the expense of goal-directed behavior.53

Two studies provide indirect evidence for an imbalance between goal-directed and habitual pathways in humans with substance use disorders. A recent fMRI study using a cue-reactivity paradigm54 showed that heavy drinkers (some with AD) showed significantly higher cue-induced activation of the dorsal striatum, whereas light social drinkers showed higher cue-induced activations in prefrontal areas and the ventral striatum. The authors interpreted these findings as an indication for a shift from ventral to dorsal striatal involvement in the course of a developing alcohol use disorder; a shift that is thought to be associated with the increasing role of habit-like drug seeking behavior. In a pharmacological fMRI study, compulsivity and its neural correlates were assessed using a reversal learning task.55 Stimulant dependent subjects made significantly more perseverative responses than healthy volunteers, an effect that was negatively correlated with activation in the right caudate nucleus and normalized after administration of the dopamine D2/3 agonist pramipexole. It should be noted, however, that in contrast to the present study, these previous studies did not specifically tap into the neural correlates of goal-directed or habit learning.

Although animal studies allow experimental initiation of substance dependence to track the gradual formation of substance-specific habits, ethical and practical issues evidently preclude such investigations of drug-specific habits in humans. Therefore, in the present study we took the approach of studying instrumental learning in the context of alcohol-related pictures versus neutral (fruit) pictures, with the aim of experimentally isolating alcohol-specific versus general habits. However, we found no significant differences between these stimulus types in performance or imaging data, suggesting a general shift towards habitual behavior, which is expressed even in the context of AD-irrelevant stimuli.

This observed general shift towards S–R habits could be a consequence of chronic alcohol use by repeated exposure to the harmful effects of drugs of abuse31,56,57 or a predisposing trait or vulnerability for the development of compulsive and impulse regulation disorders, or both.53,58 A pre-existing habitual trait could increase the risk of substance abuse and dependence after initial drug intake, but persistent substance use could also exacerbate the habitual trait already present, thus rendering users more prone to develop chronic and habitual drug-seeking behaviors. In this way, a vicious cycle is initiated, ultimately spiraling into clinical dependence.10 The current study reports a negative correlation between AD duration and engagement of the goal-directed pathway (VMPFC) during instrumental learning, suggesting that persistent substance abuse could play at least some role in the excessive reliance on habitual learning in AD patients. However, the reported negative association between goal-directed VMPFC activity and AD duration was subthreshold only, and should therefore be interpreted with caution. In addition, with the current crossectional study design, it is not possible to determine whether unbalanced goal-directed and habitual control systems are a consequence of detrimental alcohol use, or whether they reflect a premorbid individual difference which results in an increased risk for developing alcohol dependence. Prospective or longitudinal studies using similar methods as those used here are needed to assess whether an individual's degree of reliance on habitual control reflects a consequence of dependence, or is a risk factor in the development of early and chronic substance dependence.

We were the first to use an instrumental learning task like the one designed by deWit et al.19 in an addicted patient group, in combination with fMRI. This study exemplifies an approach that is still exploratory and requires further development for these preliminary data to be considered definitive. However, the data are nevertheless consistent with a coherent theoretical approach derived from translational studies indicating an important role for habit-based behavior in stimulant drug addiction, now applied to human alcohol dependence. Therefore, the current findings contribute to the conceptualization of addiction as an overreliance on stimulus-driven habits at the expense of flexible, goal-directed action,3 leading to frequent and persistent substance use despite serious negative consequences. To date, treatment programs for AD have mainly tried to reduce the positive reinforcing properties of drugs (‘blocking the buzz’ for example, with the opioid antagonist naltrexone) or to reduce the negative reinforcing aspects of chronic alcohol use represented by relief craving (‘curing the blues’ for example, with acamprosate).5 While these outcome reevaluation interventions are of crucial importance to adjust the goal status of addictive substances, recent research has provided evidence that addictive stimulus-driven behavior may still persist.59 Therefore, the findings of the current study suggest that new pharmacological or psychotherapeutic interventions should be developed to target inflexible, habitual drug-seeking.