The dopaminergic midbrain is associated with reinforcement learning, motivation and decision-making – functions often disturbed in neuropsychiatric disorders. Previous research has shown that dopaminergic midbrain activity can be endogenously modulated via neurofeedback. However, the robustness of endogenous modulation, a requirement for clinical translation, is unclear. Here, we examine whether the activation of particular brain regions associates with successful regulation transfer when feedback is no longer available. Moreover, to elucidate mechanisms underlying effective self-regulation, we study the relation of successful transfer with learning (temporal difference coding) outside the midbrain during neurofeedback training and with individual reward sensitivity in a monetary incentive delay (MID) task. Fifty-nine participants underwent neurofeedback training either in standard (Study 1 N = 15, Study 2 N = 28) or control feedback group (Study 1, N = 16). We find that successful self-regulation is associated with prefrontal reward sensitivity in the MID task (N = 25), with a decreasing relation between prefrontal activity and midbrain learning signals during neurofeedback training and with increased activity within cognitive control areas during transfer. The association between midbrain self-regulation and prefrontal temporal difference and reward sensitivity suggests that reinforcement learning contributes to successful self-regulation. Our findings provide insights in the control of midbrain activity and may facilitate individually tailoring neurofeedback training.
The dopaminergic midbrain, including the ventral tegmental area (VTA) and substantia nigra (SN), plays a crucial role in reward processing, reinforcement learning1,2,3,4, motivation5,6, and decision-making7. Dysfunctions of the reward system have far-reaching consequences and are associated with the development of psychiatric disorders, such as addiction8 and schizophrenia9,10. Despite decades of extensive neuroscience and imaging studies which have contributed to an impressive body of knowledge of normal and abnormal reward system function, the neural mechanisms controlling midbrain activity are still not fully understood11. One key issue that has received increasing attention is whether humans are able to cognitively control brain activity within the reward system. It has already been shown that both healthy controls12,13, and patients with cocaine addiction14 can use visually displayed SN/VTA activity to learn to regulate it during real-time functional magnetic resonance imaging (rt-fMRI) neurofeedback training by means of mental imagery. However, the outcome of primary interest in neurofeedback training is a transfer beyond training itself, i.e., the ability to regulate activity also after training and without feedback.
Transfer of neurofeedback training is critical for clinical applications, including those involving disorders of the reward system15, but the evidence is mixed. Specifically, MacInnes and colleagues13 observed significant neural transfer effects following rt-fMRI training in the form of increased neural activity and connectivity of the VTA during transfer on the group level. However, other studies12,14 using neurofeedback training of the SN/VTA found little group effects and substantial between-subject variation in self-regulation performance at transfer. Here, we go beyond the prior work by investigating how this variation arises, how individuals with more transfer success differ from those with less transfer success, and whether activity in brain regions other than the VTA characterizes individuals with successful transfer. We also ask whether individual differences in learning (defined as change in SN/VTA activity during training, when feedback is available) are related to individual non-midbrain transfer success (change in neural activity outside the SN/VTA in post-training session relative to baseline, both without feedback). Thus, the main contribution of our study is to characterize the neural mechanisms related to successful self-regulation of SN/VTA activity. Specifically, we combine data from two previous rt-fMRI neurofeedback studies12,14 and pursue three aims.
(Aim 1) Our first goal was to characterize often neglected16 individual differences in the degree of successful transfer of SN/VTA self-regulation and thereby differentiate regulators from non-regulators. We reasoned that the cognitive (or executive) control network is in a prime position for performing a demanding task such as imagery17 and shaping subcortical brain regions. Indeed, animal and human studies have shown direct and indirect anatomical connections between prefrontal cortex and SN/VTA18,19,20,21 and functional studies using electrophysiology22, optogenetics23, or fMRI24 have corroborated the physiological relevance of prefrontal-SN/VTA interactions. Therefore, we test the hypothesis that successful transfer of SN/VTA regulation is associated with activation in brain regions that are part of the cognitive (executive) control network, especially prefrontal areas.
(Aim 2) Our second goal was to determine whether the framework of (operant) associative learning can be used to explain neurofeedback training. In applications of the associative learning framework to neurofeedback17,25, the feedback provides a higher order reward and the chosen mental strategy is reinforced in proportion to the sign and magnitude of the feedback. At the beginning of the training, participants cannot predict which strategy will consistently lead to up- or downregulation of brain activity within the target region. Therefore, if they use an adequate strategy, participants receive more reward than predicted corresponding to a positive temporal difference between consecutive (actions in) states. As a consequence, they would be more likely to repeat the strategy, expect higher feedback next time and gradually learn how to keep the feedback signal within the dopaminergic midbrain high. Accordingly, in regulators the size of the temporal difference between feedback states should gradually decrease as the expected feedback increasingly converges with the actual feedback. In contrast, for non-regulators and participants in a control group receiving unrelated or unstable feedback, the temporal differences would remain large and variable because these participants cannot learn any association between mental strategies and feedback. These straightforward implications of current theorizing about the mechanisms underlying neurofeedback remained largely untested (for a simulation study on the temporal dynamics of feedback: Oblak and colleagues26; for the correlation of BOLD with signal increase (i.e. success) and decrease (i.e. failure) during regulation: Radua and colleagues27). Here, we investigate the temporal difference mechanism in regions that communicate with the SN/VTA by testing for non-midbrain associations with midbrain temporal difference signals. Note that the SN/VTA has been traditionally associated with the coding of reward prediction errors in both animal2,28,29 and human research30,31. Furthermore, the causal sufficiency of dopaminergic prediction error signals for learning has been reinforced by optogenetics32,33. Together, we hypothesize that decreasing SN/VTA temporal difference signals during neurofeedback learning are associated with successful self-regulation as expressed in positive transfer success in non-midbrain regions. In other words, we test whether non-midbrain regions show a negative correlation with temporal difference signals as coded by the midbrain over the course of the experiment and relate this decrease to transfer success.
(Aim 3) Our third goal was to further distinguish regulators from non-regulators. Hence, we related the individual differences in the ability to self-regulate midbrain activity to individual markers of neural reward sensitivity. We asked whether successful self-regulation, as measured by transfer effects, taps into general properties of the reward system. Given that adaptive reward processing characterizes the SN/VTA1,34 we used a variant of the monetary incentive delay (MID) task that captures differences in adaptive reward sensitivity between clinical and non-clinical populations35. Using this task, we tested the hypothesis that reward processing in regions that may control the dopaminergic midbrain is related to successful SN/VTA self-regulation.
In sum, to study individual differences in the capability to gain control of the SN/VTA we used rt-fMRI neurofeedback training in healthy participants receiving either positive-going feedback (standard group) or inverted (negative-going) feedback (control group). We quantified the individual degree of successful midbrain transfer by comparing the individual post-training versus pre-training self-regulation capabilities. Moreover, we related individual differences in reward sensitivity to separately measured midbrain transfer success.
No difference in degree of regulation transfer (midbrain DRT) across groups
We first evaluated the midbrain DRT measure (Fig. 1 and Supplementary Fig. 1) and compared it between the three datasets. There were no significant differences across all three groups (mean DRT standard group Study 1 = 0.01, mean DRT standard group Study 2 = −0.02, mean DRT inverted group Study 1 = −0.05. Both parametric ANOVA: F(2, 56) = 0.13, p = 0.8 and non-parametric Kruskal-Wallis: H(2) = 0.39, p = 0.82 tests concurred on differences not being significant). Moreover, also the direct comparison between the two standard feedback groups was not significant (T(39) = −0.26, p = 0.8). Accordingly, we combined the two standard groups for subsequent analyses. Importantly, our participants showed considerable variation in DRT, which allowed us to investigate the individual differences in brain activity accompanying more or less successful transfer of SN/VTA self-regulation through neurofeedback. Thus, the groups showed similar mean levels and considerable individual differences in transfer success.
Correlation of slopes between transfer success and self-regulation during training only for standard group (manipulation check)
As a manipulation check, we investigated the relation between training and transfer success. Specifically, we determined the slope of SN/VTA signal increase over training (i.e., the averaged difference of IMAGINE_REWARD – REST blocks in the second neurofeedback training run – first neurofeedback training run) and related it to midbrain DRT in Spearman’s correlations for the standard and inverted feedback group. We found positive correlations between the slope of SN/VTA signal change during training period and midbrain DRT only for the standard feedback group, but not for the control group (Supplementary Fig. 2; standard group ρ = 0.62, p < 0.001; inverted group ρ = −0.3, p = 0.25, one-sided test for difference between correlations in independent samples z = 3.03, p = .001). Although the comparability between training and transfer is limited due to the feedback signal processing, particularly those individuals who were more successful at transfer also showed stronger upregulation during training and this relation was specific to the standard feedback group.
Individual variation in transfer: midbrain DRT associated with cognitive control network in standard and amygdala activity in inverted feedback group (aim 1)
Standard feedback group
We investigated whether individual levels of successful SN/VTA self-regulation (midbrain DRT) were associated with increased post- minus pre-training activity in other regions of the brain, i.e., non-midbrain DRT. This analysis revealed several areas consistently reported by neurofeedback studies (see Fig. 2 in the literature review of Sitaram et al.17), including dorsolateral prefrontal cortex (dlPFC), anterior cingulate cortex (ACC), lateral occipital cortex (LOC), and thalamus (Fig. 2a and Table 1). To formally test for a more general association with the cognitive control network, we applied a cognitive control network template from a meta-analysis36, which in addition revealed neural activity in precuneus and striatum (Fig. 2b for exemplary illustrations of dlPFC, ACC, temporal gyrus, and thalamus activity; Supplementary Table 1 for full overview). Thus, regions of the cognitive control network showed transfer to the extent that neurofeedback training of the dopaminergic midbrain was successful.
Inverted feedback group
For the inverted feedback group, the same analysis resulted in partly distinct activations. In contrast to the standard feedback group, left amygdala activity correlated significantly with midbrain DRT (Fig. 3 and Supplementary Table 2). Importantly, activity in cognitive control areas identified for the standard feedback group, such as dlPFC and ACC, was significantly weaker in inverted than standard feedback groups (Supplementary Table 3 for disjunction and direct statistical comparison). Together with the lack of correlation of midbrain DRT with SN/VTA signal change during training for the inverted feedback group, these findings suggest that cognitive control regions play a preferential role for successful transfer of SN/VTA self-regulation.
We also tested for common activity in the two feedback groups using conjunction analysis. Similar to the standard group, the inverted feedback group showed correlations between midbrain and non-midbrain DRT in the precuneus, middle temporal gyrus, insula, thalamus, and parahippocampal gyrus (Supplementary Table 4). These common areas appear to reflect non-specific regulation activity and may be associated with memory and introspection processes.
Reinforcement learning: Reduced relation between dlPFC and midbrain temporal difference signals during neurofeedback training correlates with midbrain DRT (aim 2)
To investigate whether reinforcement learning mechanisms contribute to successful self-regulation transfer, we assessed the temporal differences in the SN/VTA feedback signal as proxy for the temporal difference signal during the neurofeedback training runs. We reasoned that temporal differences should decrease from early to late phases of neurofeedback training (at least in successful regulators, see next paragraph). Thus, we assumed that at any time during neurofeedback training, participants came up with their own predictions of the upcoming feedback signal and compared the predictions with actual feedback at the next time point. Similarly, in temporal difference learning models, errors in reward prediction are calculated at each moment in time37. Therefore, we operationalized these temporal differences by subtracting the immediately preceding SN/VTA activity (prediction) from the present SN/VTA activity (outcome: Supplementary Fig. 3). The basic contrast of temporal difference coding, i.e. without correlation to midbrain DRT, revealed striatal activity, a well established finding in the learning literature38,39.
Next, we tested for a negative correlation of non-midbrain DRT with the difference in SN/VTA temporal difference signals between late and early training. In other words, for successful regulators, we expected to find a negative correlation between the decrease in midbrain temporal difference signal over the course of the neurofeedback training and activity in other brain regions. We found such a relation with gradually decreasing SN/VTA temporal difference signals in dlPFC (Fig. 4 and Supplementary Table 5). To interrogate the finding in more detail, we also analysed the two neurofeedback training runs separately. This analysis confirmed that only successful regulators showed a less pronounced relation between dlPFC activity and SN/VTA temporal difference error signals in late compared to early training (see Supplementary Fig. 4 for run-wise analysis in dlPFC). Importantly, it should be noted that this decrease in the relation with learning-related error signals correlated with SN/VTA regulation success.
Learning-related functional coupling of dlPFC with SN/VTA
Following on from our finding of the decreasing relation between SN/VTA temporal difference error coding and dlPFC activity being indicative of individual success of regulating the dopaminergic midbrain, we performed a functional connectivity analysis to investigate whether the identified dlPFC region is related to the SN/VTA region our participants aimed to regulate. Thus, we used the dlPFC region showing decreasing relation with temporal difference coding during training particularly in successful regulators as a seed region and investigated whether it was coupled to the SN/VTA. Functional connectivity between the two regions increased with transfer success (Fig. 5; t(40) = 3.79, cluster extent = 16, MNI x = −2, y = −16, z = −15). In other words, midbrain DRT and dlPFC to SN/VTA connectivity correlated positively. Note that this correlation of midbrain DRT with dlPFC-SN/VTA connectivity was task-related as it was enhanced during IMAGINE_REWARD relative to REST (which served as psychological regressor) and independent of SN/VTA activity.
Individual differences in dlPFC reward sensitivity during MID task correlate with transfer success (aim 3)
In Study 2 we used the MID task to independently measure reward sensitivity and the capability to adapt to different reward contexts35. We asked whether individual measures of reward processing (measured with parametric and adaptive coding of reward related BOLD activity) are related to individual success in regulating the SN/VTA. Specifically, we tested for correlations between DRT and (i) MID reward sensitivity (sum of small and large reward parametric modulators) and (ii) MID adaptive reward coding (difference of small minus large reward parametric modulators). These two correlations both identified dlPFC (Fig. 6a). Moreover, a conjunction of these two correlations with the correlation between midbrain DRT and non-midbrain DRT revealed common neural activity in the dlPFC (center at MNI x = 40, y = 10, z = 38; Fig. 6 and Supplementary Table 6). Thus, the more successful individuals were at self-regulating SN/VTA as a result of neurofeedback training, the more sensitive they were to reward and the more strongly they adapted to different reward contexts in the MID task.
In the present work, we used data acquired from two previous rt-fMRI neurofeedback studies to characterize individual differences and processes underlying successful transfer of self-regulation of the dopaminergic midbrain after neurofeedback training. This perspective on transfer success provided insights on what distinguished individuals who were more successful at SN/VTA regulation from those who were less successful: First, the midbrain degree of regulation transfer varies across individuals and this variance is related to different patterns of neural activity during neurofeedback training and transfer. Second, in particular, we found a significant relation between transfer success and increases in post- minus pre-training activity in the cognitive control network. Third, we found four correlations with stronger degree of midbrain regulation transfer: (i) decreasing relation between dlPFC activity and SN/VTA temporal difference error signals during neurofeedback training, (ii) stronger connectivity of dlPFC with the SN/VTA for reward imagination compared to rest during transfer, (iii) stronger reward sensitivity in dlPFC and (iv) stronger adaptive reward coding in dlPFC in the independent MID task. Fourth, control analyses revealed that alternative forms of learning (sensitization or desensitization) could not explain our findings.
Together, our study suggests that neurofeedback control of the dopaminergic midbrain relies on the cognitive control network and that the predictability of the upcoming feedback indexed by reinforcement learning signal contributes to successful neurofeedback training. Sustained self-regulation skills and the generalization of learning after neurofeedback training are key elements for practical applications and remain one of the major challenges in rt-fMRI neurofeedback research40. Pioneering neurofeedback studies of the reward system found little transfer on average12,41,42. Only one study13 reported significant post-minus pre-training activity in the VTA, and increased mesolimbic network connectivity. This study used only the VTA as target region for neurofeedback and included visual and false-feedback rather than inverted control groups. The instructions given to the participants in this study differed from those of the other studies and our study mainly in the recommended strategies (see detailed comparison in Supplementary Note 1). These differences may partly explain why the latter study found a group effect but the others did not. Nonetheless, our findings yield the same direction on individual transfer success levels as the finding from MacInnes and colleagues (2016): successful regulators learn to increase dopaminergic signaling. We go beyond this work in several ways. First, the previous studies focused exclusively on self-regulation of one a priori target region, such as SN/VTA, instead of investigating transfer effects within the whole brain. Second, transfer effects were examined at the group-level, which did not reflect the individual learning success. Therefore, in the present study we overcome these limitations by taking advantage of an individual measure of transfer success (midbrain DRT) and analysing the whole brain (non-midbrain DRT). Transfer success was distributed in our study, in line with studies using other neurofeedback modalities such as EEG16,43,44,45,46. Our findings were specific to the SN/VTA as indicated by the control analysis with the parahippocampus and we avoided, or accounted for, potential confounding variables, such as practice and familiarity, personality measures and strategies. Thus, midbrain DRT as measured directly after neurofeedback training, meaningfully captures the individual capability of SN/VTA self-regulation.
One insight of the present study is that transfer success associates with non-midbrain neural activity in cognitive control network areas36,47, such as dlPFC and ACC. The lack of cognitive control engagement within the control group and the correlation of midbrain DRT with the slope of SN/VTA increase during training in the standard feedback group suggests a preferential relation between successful transfer and learning. The cognitive control network includes regions that have been associated with feedback-related information processing during training48,49. Together, these findings suggest that the same regions contribute to acquisition and transfer of neurofeedback and that sustained post-training self-regulation generalizes across a functional network of different brain regions. Intriguingly, similar networks have been reported in skill learning and future studies may therefore investigate commonalities between neurofeedback and particularly cognitive skill learning, taking into account the specific temporal dynamics of both functions25,50.
The finding that individuals with more successful regulation of the dopaminergic midbrain show stronger activation of cognitive control areas during transfer speaks to our understanding of how individual differences in cognitive control affect emotion regulation51,52,53,54. For example the working memory component of cognitive control has been shown to predict negative affect reduction through reappraisal and suppression55. Interestingly, dopamine action (particularly at D1 receptors) in dlPFC sustains working memory performance56. Thus, it is conceivable that frontolimbic loops contribute to successful transfer. In any case, this notion converges with our finding of dlPFC-SN/VTA coupling being related to transfer success.
Future research might explore whether the positive non-midbrain training effects in the cognitive-control network also have implications for transdiagnostic clinical applications. First, combining rt-fMRI neurofeedback training with different forms of psychotherapy such as cognitive behavioral therapy57, dialectical behavioral therapy58, or psychodynamic therapy59,60,61 could improve emotion regulation deficits prevalent in several psychiatric disorders including substance use disorders, depression, anxiety and personality disorders. It has already been shown that in patients suffering from depression, neurofeedback training can be a successful tool to re-stabilize modulation of the amygdala and increase its responsivity to reward62. It remains a question for future patient studies if such training also enables re-stabilization of cognitive control. With particular attention to substance use disorders, maladaptive changes in neuroplasticity within the cognitive control network are closely associated with loss of control and compulsive drug-seeking63,64,65. In these patients, neurofeedback training might be able to directly target the biological correlates and reinstate function of the cognitive-control network and thereby of the SN/VTA.
While our first aim investigated a sustained form of dopaminergic responses during transfer, our second aim operationalized the first derivate of the sustained modulation in dopaminergic midbrain as temporal difference signal. This allowed us to investigate a more phasic form of responses outside the dopaminergic midbrain during training. We found a reduction in the relation between the decreasing SN/VTA temporal difference signal and dlPFC activity over the course of the neurofeedback training for successful regulators only, while this relation remained high for non-regulators. This finding suggests that a temporal difference-like signal might be tracked by the dlPFC, which is compatible with the notion that temporal difference error-driven reinforcement learning was more pronounced in regulators than non-regulators and provides empirical evidence for previous theoretical proposals on the principles of neurofeedback learning independent of feedback modality25. Thus, reinforcement learning provides a framework for understanding how neurofeedback works (by reducing temporal difference signals regarding future feedback). Future research may want to investigate whether the rich theoretical and empirical tradition of reinforcement learning66 can be harnessed to facilitate neurofeedback training.
We found that successful SN/VTA self-regulation is associated with increased functional coupling between dlPFC regions related to temporal difference signals and the dopaminergic midbrain. This coupling fits well with anatomical connections between dlPFC and the dopaminergic midbrain19,21 as well as effective connectivity studies on motivation24 and animal studies on prefrontal regulation of midbrain activity20,67. While human work primarily focused on coupling between the prefrontal cortex and the striatum68,69,70, animal work has documented both direct (glutamatergic) and indirect (through inhibitory interneurons) projections from prefrontal cortex to dopaminergic neurons19,71,72. These projections modulate event-evoked VTA activation consistent with phasic fMRI signals documented in humans24, animals22,73,74, and viral tracing/optogenetic stimulation studies75. Our data are in line with these findings and suggest comparable underlying mechanisms of dopamine release by volitional midbrain self-regulation.
At the functional level, a recent study on creative problem solving in humans associates dlPFC with experiencing a moment of insight76. According to this effective connectivity study, dlPFC could upregulate the VTA/SN via striatal connections during such a moment. On the other hand, in trials where no solution was found for a given problem, also no significant connectivity was observed. Our study reinforces the notion that dlPFC-SN/VTA connectivity plays an important role in self-guided motivation and in internal reward processing. Our finding points to the possibility that cognitive and affective mechanisms associated with different experiences also involve different neural pathways. Future studies may want to investigate to what degree individual differences in the functional architecture of brain networks77 influence these internal reward mechanisms and to which degree different strategies can influence neurofeedback training success.
Our independent reward task revealed that individual differences in prefrontal reward sensitivity and efficient adaptive reward coding were associated with successful SN/VTA self-regulation. Adaptive coding of rewards captures the notion that neural activity (output) should match the most likely inputs to maximize efficiency and representational precision78. Accordingly, we previously showed that reward regions encode a small range of rewards more sensitively than the large range of rewards79,80. Interestingly, in the present study, participants who were more sensitive to small rewards were also more successful in self-regulation of the dopaminergic midbrain. When participants in a typical neurofeedback training paradigm succeed at increasing the activity of the self-regulated area, the ensuing change in visual stimulation (positive neurofeedback) may constitute a small reward. By extension, adaptive reward coding may therefore provide a useful handle on identifying regulators. Moreover, future neurofeedback experiments should consider scaling the feedback signal to avoid sensitivity limitations, particularly in individuals with reduced adaptive coding.
A potential limitation of our study is that we used a combined mask for SN and VTA even though differences in functionality and anatomy have been reported for the two regions (reviewed e.g. by Trutti et al.81), with the SN more related to motor functions and the VTA to reward functions. However, it should be kept in mind that when viewed through the lens of recording and imaging rather than lesion techniques the differences between regions are more gradual than categorical82. Still, future studies may want to use more specific feedback from one or the other region to more specifically target potential differences in functions. Further limitations are that only inverted feedback is available here as control group and that this group has a smaller sample size. With regard to the interpretation of the results from the inverse feedback control, it is important to note that we controlled for explicitly stated strategies. But it is of course conceivable that some strategies are harder to become aware of and express in words than others and that this issue particularly applied to the inverted feedback group. One could speculate that the individual differences in the relation between amygdala and midbrain DRT in the control group reflect differences in frustration. Although a different brain network appears to be related to midbrain DRT in this group (Supplementary Fig. 5), this interpretation requires further investigation because the control group did not perform a tailored behavioral assessment of frustration after the experiment. As stated by Sorger and colleagues83, including measures of frustration is considered critical in the design of neurofeedback studies nowadays. Additional control groups receiving no or noninformative feedback could help assess the effects of neurofeedback training in situations where participants cannot learn anything, although these control groups suffer from other issues, such as reduced contingency between regulation efforts and signal change, which may lead to disengagement83. Still, our data show a significant correlation between degree of regulation transfer and training runs only for the standard feedback group and not for the control group. Moreover, other neurofeedback studies have shown that volitional self-regulation of brain activity can only be learned when real feedback is presented84 and that other control groups failed to acquire VTA self-regulation13. Nonetheless, future studies investigating such control groups will be necessary to replicate these findings. In general, we recommend for future studies to take current best practice guidelines83,85 into account when designing neurofeedback experiments to overcome such limitations and maximize replicability of findings.
In conclusion, in a series of analyses integrating multiple fMRI measures and computational learning parameters we were able to parse out individual factors contributing to successful transfer of midbrain self-regulation after neurofeedback training. One such factor was activity in the cognitive control network, particularly dlPFC. Future studies could employ cognitive control activity during neurofeedback training to boost success rates and clinical outcomes. Furthermore, our findings suggest that associative learning contributes to real-time fMRI neurofeedback effects. Finally, we show that higher individual reward sensitivity in the dlPFC increases the chance of neurofeedback training success. Patients with reduced neural reward sensitivity may therefore benefit from careful scaling of the neurofeedback information to equalize the subjective value of the reward across participants.
Fifty-nine right-handed participants (45 males, average age 28.25 ± 5.25 years) underwent SN/VTA neurofeedback training. We analysed data from two independent projects, which used highly similar rt-fMRI paradigms, rt-fMRI software and scanner hardware. The first dataset12 comprised male participants, randomly assigned to one of two groups. The experimental group received three runs of standard neurofeedback (N = 15), the control group received inverted neurofeedback (N = 16) as training signal. The second dataset14 comprised the healthy control participants (N = 28, 14 males) of a project investigating also cocaine users (these data are not presented here). This group received two runs of standard neurofeedback. A subset of the participants in the second dataset (N = 25) also performed a variant of the monetary incentive delay (MID) task35. In both studies, participants were recruited from the same age range of 24 to 35 years. Study 1 recruited healthy, non-smoking participants from a departmental database of university students. In Study 2 healthy participants were recruited via online advertisement and matched on regard to sex, age, and nicotine consumption to an inpatient group of patients with cocaine addiction from the Psychiatric University Hospital. Exclusion criteria were clinically relevant somatic diseases, head injury or neurological disorders, family history of schizophrenia or bipolar disorder, and use of prescription drugs affecting the central nervous system. Additional exclusion criteria for both study groups were MRI ineligibility due to non-removable ferromagnetic objects in the body, claustrophobia, or pregnancy. All participants provided written informed-consent and received compensation for their participation. The Zurich cantonal ethics committee approved these studies in accordance with the Human Subjects Guidelines of the Declaration of Helsinki.
Experimental setup and neuroimaging
All participants underwent neuroimaging in a Philips Achieva 3 T magnetic resonance (MR) scanner using an eight channel SENSE head coil (Philips, Best, The Netherlands) either at the Laboratory for Social and Neural Systems Research Zurich (SNS Lab, Study 1) or the MR Center of the Psychiatric Hospital of the University of Zurich (Study 2). First, we acquired anatomical images (Study1: gradient echo T1-weighted sequence in 301 sagittal plane slices of 250 × 250 mm2 resulting in 1.1 mm3 voxels; Study2: spin-echo T2-weighted sequence with 70 sagittal plane slices of 230 × 184 mm2 resulting in 0.57 × 0.72 × 2 mm3 voxel size) prior to neurofeedback training and loaded them into BrainVoyager QX v2.3 (Brain Innovation, Maastricht, The Netherlands) to identify SN/VTA as target region (see section on SN/VTA region-of-interest for details). For the functional scans, we used 27 ascending transversal slices in a gradient echo T2*-weighted whole brain echo-planar image sequence in both studies. The in-plane resolution was 2 × 2 mm2, 3 mm slice thickness and 1.1 mm gap width over a field of view of 220 × 220 mm2, a TR/TE of 2000/35 ms and a flip angle of 82°. Slices were aligned with the anterior–posterior commissure and then tilted by 15°. Functional images were converted from Philips par/rec data format to ANALYZE and exported in real-time to the external analysis computer via the DRIN software library provided by Philips. This external computer ran Turbo BrainVoyager v3.0 (TBV – Brain Innovation, Maastricht, The Netherlands) to extract the BOLD signal from the images and calculate the neural activation for the feedback signal. The visual feedback signal was presented using custom-made software with Visual Studio 2008 (Microsoft, Redmond, WA, USA) through either a mirror mounted at the rear end of the scanner bore (Study 1) or through MR compatible goggles (Study 2).
The participants were instructed that their goal was to control a reward-related region in their brains by imagining rewarding stimuli, actions, or events. We have previously shown that reward imagination activates SN/VTA with conventional fMRI86. Prior to scanning, we provided examples of such rewards, including palatable food items, motivating achievements, positive experiences with friends and family, favourite leisure activity or romantic imagery. We encouraged participants to use these different rewards as potential strategies for upregulating reward-related activity during the cue ‘Happy Time!’, here referred to as IMAGINE_REWARD condition. In contrast, during the cue ‘Rest’ (here referred to as REST condition), participants were asked to perform neutral imagery, such as mental calculation to reduce reward-related activity. In both conditions, the real-time SN/VTA BOLD signal was continuously fed back to the participant in the form of a smiley that translated vertically in proportion to the signal (Fig. 7). Prior to training, participants were familiarized with the 5 s delay of the hemodynamic response affecting the display of the feedback and were asked not to move or change their breathing during the neurofeedback training. The control group received identical instructions and was debriefed after the session about the inversion of the feedback signal.
Each neurofeedback session comprised: a pre-training imagery baseline run without any feedback, three (Study 1) or two (Study 2) training runs during which neurofeedback was presented (as Study 2 also investigated patients, training was limited to two runs), and a transfer run (i.e., without feedback). Each of these runs comprised nine blocks of IMAGINE_REWARD and REST conditions, each lasting 20 s. To determine the current level of the feedback signal in the neurofeedback sessions, we used the average of the last five volumes of the previous REST condition as reference value (minimizing drift and motion effects) and employed a moving average of the previous three volumes to reduce noise. In the standard feedback group, the smiley moved up with increasing percent signal change in the SN/VTA BOLD signal and changed colour from red to yellow (Fig. 7a). In the inverted feedback group, the smiley moved up and turned yellow with decreasing SN/VTA BOLD signal.
SN/VTA region-of-interest (ROI)
In both studies, the target region for neurofeedback, i.e. the substantia nigra (SN) and ventral tegmental area (VTA), was structurally identified using individual anatomical scans. Since the individual mask definition slightly differed between Study 1 and 2 (T1-weighted scans in Study 1 and T2-weighted scans in Study 2), we used an independent mask for our post-hoc analysis. By this, we can control for individual differences between experimenter ROI selection strategies, avoid interpolation confounds due to warping by normalization and use a reliable seed region for functional connectivity analysis. Specifically, we used the probabilistic mask of the SN and VTA as defined by87, which is based on a large sample (148 datasets) and available on https://www.adcocklab.org/neuroimaging-tools (download August 2018). Figure 7b illustrates the mask within the brain. From this mask, we extracted and averaged SN/VTA activity for each participant using custom-made scripts in Matlab R2016b.
Degree of regulation transfer (DRT)
We assessed the effects of individual differences in performance to characterise participants on a continuous regulation scale. The measure of successful self-regulation was defined as individual degree of regulation transfer (DRT). We calculated DRT both for the midbrain and non-midbrain regions, i.e. as the condition-specific signal difference between post-training (Transfer) and pre-training (Baseline) runs:
Thus, a positive DRT corresponds to a relative increase in post-training (midbrain or non-midbrain) BOLD activity compared to pre-training BOLD activity for the contrast IMAGINE_REWARD minus REST. It is essential to note that during these two runs (pre-training baseline, post-training transfer) no neurofeedback was presented. The absence of feedback ensures comparability between participants in the different groups, as the perception and processing of the feedback signal during the training runs might be different and influence the SN/VTA signal itself.
Midbrain DRT distributions
To investigate potential group differences in midbrain DRT, we transferred the extracted SN/VTA data to R (R-project R3.4.1). Using an ANOVA and a non-parametric Kruskal–Wallis test, we tested for differences between the three groups (i.e. the two groups receiving standard feedback in Studies 1 and 2 and the control group receiving inverted feedback in Study 1).
Whole-brain correlations with midbrain DRT in fMRI analysis
To investigate DRT outside the midbrain, we tested for individual differences in successful transfer at the whole brain level (non-midbrain DRT). Thus, non-midbrain DRT identified regions that were positively associated with midbrain DRT and thereby potentially contributed to regulation of the SN/VTA. This approach goes beyond previous research which contrasted average effects in healthy participants or in healthy controls against patient groups12,13,14.
Control analyses and measures
To validate our approach of investigating individual differences in midbrain DRT and relating them to non-midbrain DRT, we performed several control analyses and took control measures. First, we performed a control analysis which assessed the spatial specificity of our effects. To do so, we correlated whole brain activity with a region other than the SN/VTA. Specifically, we used the neighbouring parahippocampus (Supplementary Note 2). In keeping with specificity, this control analysis revealed little commonality (limited to the cerebellum and temporal gyrus) with the SN/VTA analysis (Supplementary Fig. 6 and Supplementary Table 7). Note that if unspecific individual differences would explain our main findings then one would expect similar results in the control analysis, contrary to what we find. Second, we ascertained that midbrain DRT correlated with midbrain neurofeedback training. To achieve positive SN/VTA transfer effects, participants had to apply what they had learned during neurofeedback training runs. We therefore tested whether midbrain DRT was related to SN/VTA activity increase during the training runs by calculating the correlation between them. Specifically, we determined the slope of SN/VTA signal increase over training (i.e., the averaged difference of IMAGINE_REWARD – REST blocks in the second neurofeedback training run – first neurofeedback training run) and related it to midbrain DRT in Spearman’s correlations for the standard and inverted feedback group. Third, we included only participants without prior experience with neurofeedback to control for practice and familiarity differences. Moreover, we performed a control analysis to check whether individuals with successful subsequent transfer differ already at the baseline session by correlating the IMAGINE_REWARD-REST contrast at baseline only with midbrain DRT. Using identical family-wise error cluster-level correction as for correlation with transfer-baseline revealed no findings within the cognitive control network at baseline only. Moreover, inclusively masking the baseline findings with transfer-baseline findings revealed no overlap of the two analyses. Thus, individual differences in midbrain DRT reflect primarily what participants learned during the neurofeedback training sessions rather than pre-existing individual differences. Fourth, we controlled for personality measures and strategies employed by participants. We note that we used (mean-centered) midbrain DRT as continuous scale with the aim of investigating the mechanisms underlying successful self-regulation performance at the individual level beyond the SN/VTA. Accordingly, we excluded SN/VTA from non-midbrain DRT analyses, which also avoided any circularity. We also note that using within-study normalization of DRT as an alternative control for study effects left all inferences unchanged.
In addition to the neurofeedback training, the participants of Study 2 (N = 25) performed a MID task that captures differences in adaptive and general reward sensitivity. Participants performed this task after the neurofeedback training and after a break of approximately 45 min during which they left the scanner. In every trial of the MID task35,80,88 first one of three cues appeared (Supplementary Fig. 7). One cue was associated with large reward (ranging from 0 to 2.00 CHF), one cue with small reward (0 to 0.40 CHF) and one cue with no reward. After a delay of 2.5 to 3 s, participants had to identify an outlier from three circles by pressing one of three buttons as quickly as possible. Depending on the cue, their response time and the correctness of the answer, participants gained an amount of money. Importantly, the use of large and small reward ranges enables investigation of individual differences not only in general reward sensitivity but also in how well the reward system adapts to different reward distributions, so-called adaptive reward coding35. To investigate whether reward sensitivity in the MID task results from (lack of) regulation success, we correlated degree of regulation transfer with total payment in the MID task. We found no significant relation between performance in the two tasks (Pearson’s r = 0.0332, p = 0.88). Thus, we found little evidence to support the notion that reward sensitivity in the MID can be explained by preceding self-regulation success.
MR Data pre-processing
We despiked the functional data using the AFNI toolbox (National Institute of Mental Health; http://afni.nimh.nih.gov/afni). To account for differences in echo-planar-image (EPI) slice acquisition times we employed temporal interpolation of the MR signal, shifting the signal of the misaligned slices to the first slice89 using FSL 5 (FMRIB Software Library, Analysis Group, FMRIB, Oxford, http://fsl.fmrib.ox.ac.uk). Furthermore, data were bias-field corrected using ANTs (Advanced Normalization Tools; http://stnava.github.io/ANTs), realigned using FSL 5, normalized to standard Montreal Imaging Institute (MNI) space using ANTs in combination with a custom scanner-specific EPI-template resulting in a 1.5 mm3 isotropic resolution and finally smoothed with a 6 mm full-width-half-maximum Gaussian kernel using FSL 5.
The spatial specificity control analyses (Supplementary Fig. 6 and Supplementary Table 7) suggest that our findings are not due to common physiological noise. To more directly account for noise, we additionally acquired physiological data in a subsample of participants. In the available subsample, neither changes in heart rate variability nor respiration were significantly correlated with VTA/SN activation during reward imagination (see details in refs. 14, Supplemental Material Supplementary Table 1, Supplementary Fig. 1). Here, we also used an image-based correction to account for physiological artefacts in all participants. Since physiological artefacts are most prominently present in cerebrospinal fluid and white matter due to the absence of BOLD effects, pulsations of the ventricles, and proximity to the large brain arteries (e.g., circle of Willis), we decided to use an established preprocessing procedure based on a principal component analysis (PCA) approach90,91. Specifically, we calculated the global mean and the first six components of a temporal principal component analysis on the cerebrospinal fluid and white matter signal. These six components were used as noise regressors in the first-level statistics (see section “MR Data analysis”) in addition to the six motion parameters. Along with the pre-processing of the fMRI data, the SN/VTA mask used as ROI for the analysis was resliced into the dimensions of the functional data using SPM 12 (v6906, Wellcome Trust Centre for Neuroimaging, UCL, London, UK; http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) within Matlab R2016b (Mathworks, Sherborn, MA, USA).
MR Data analysis
Non-midbrain DRT correlation with midbrain DRT in standard and inverted feedback group (aim 1)
The first question of this study asked whether the individual degree of successful SN/VTA neurofeedback transfer is associated with individual differences in the cognitive control network. To answer this question, we conducted a general linear model (GLM) on the single subject level including one block-wise regressor for the IMAGINE_REWARD condition and one for the REST condition with 190 timesteps (each condition comprised 9 onsets and lasted 20 s) for each of the four runs separately. Additionally, we modelled the first 5 TRs of every run as nuisance regressor and added also motion and physiological artefact regressors (see methods section for MR Data pre-processing) in the design matrix. In total the GLM consisted of fifteen regressors. We formed the contrast IMAGINE_REWARD-REST and compared it between Transfer and Baseline runs outside SN/VTA, i.e. non-midbrain DRT=(IMAGINE_REWARD-REST)Transfer – (IMAGINE_REWARD-REST)Baseline.
At the group level, we tested for correlation of non-midbrain DRT with midbrain DRT. We ran these analyses in all voxels for both the standard and inverted feedback groups. To test for common and separate activity between the groups, we performed conjunction and disjunction analyses over the two group maps. Additionally, we performed a two-sample t-test to search for significant differences between groups. To identify activity within the cognitive control network, we used a cognitive control template based on the coordinates from a meta-analysis36. We created this template with fslmaths and spheres of 15 mm around all coordinates from the meta-analysis. In Supplementary Table 1 we identify regions of the cognitive control network where non-midbrain DRT correlates with midbrain DRT within the template. For statistical maps, we used an FWE-corrected cluster level threshold, p < 0.05 (cluster extent of 230 voxels) following a cluster-inducing voxel level threshold of p < 0.001 (uncorrected). In addition, to test the functional specificity of our results, we performed a meta-analytic functional decoding analysis using the Neurosynth database (www.neurosynth.org). This relates the neural signatures of the cognitive control decoding network to other task-related neural patterns (Supplementary Fig. 5).
Temporal difference coding during NF training (aim 2)
The second question of the study asked whether successful neurofeedback performance was associated with a reduction in temporal difference coding during the training runs as captured by a classic reinforcement learning framework. There, learning corresponds to reducing prediction errors by adjusting predictions until they match experienced reward as much as possible. Mathematically, temporal differences can be viewed here as the first derivative of the observed BOLD signal in the midbrain. Traditionally, reinforcement learning theories compute the error term with the form defined by Sutton & Barto50:
Rt+1 is the next reward, V(St) and V(St+1) are current and next reward predictions, γ is a discount parameter (typically estimated to be 1 or close to 1 as t is short). In a task with continuous feedback (smiley height) like ours, Rt+1 and V(St+1) collapse, such that the error term becomes δt = Rt+1 - V(St), which corresponds to the difference of subsequent feedback states (note that actions are not observable in neurofeedback training). In other words, the error term is the (continuously evolving) incongruence between the current feedback state, which incorporates participants’ expectations about the upcoming feedback, and the next state, i.e. the actually presented feedback signal. As the highest available temporal resolution to compute the error term in our paradigm is one TR, we decided to approximate it with this resolution. However, we do not mean to imply that the brain is limited to that resolution. These temporal difference errors should be high when using unpracticed mental strategies at the beginning of the experiment. Over the time course of the neurofeedback training block, the temporal difference signal within the midbrain should decrease for participants who learn to successfully self-regulate the activity of their SN/VTA. For these participants, the change in height of the smiley should become more predictable.
To investigate temporal difference signals in non-midbrain regions in our paradigm, we constructed an additional GLM for the neurofeedback training runs and modelled the regulation conditions (IMAGINE_REWARD and REST) during neurofeedback training runs as event-related regressors for every TR (in contrast to the block-wise previous analysis). We parametrically modulated these regressors with a time-resolved continuous temporal difference term. This term was defined as difference between the current and the previous TR within the SN/VTA mask, i.e. the parametric modulator corresponded to the difference in the BOLD signal of the SN/VTA from IMAGINE_REWARDt –IMAGINE_REWARDt-1. We formed this parametric modulator separately for the upregulate and the rest condition and analysed it run-wise to investigate brain regions (excluding SN/VTA) showing a correlation with temporal difference in SN/VTA. We then investigated (Supplementary Fig. 3) at the single subject-level if the relation to this temporal difference information decreased over time by using the difference between the parametric temporal difference modulator of the second neurofeedback training run and the parametric temporal difference modulator of the first neurofeedback training run, i.e. temporal difference error coding in later training minus earlier training. This difference should become negative as temporal differences decrease with learning and outcomes become more predicted. Finally, on the group level, we correlated this contrast (between-run difference in temporal difference coding) with non-midbrain DRT in a one-sample t-test to test for whole-brain associations between a decrease in temporal difference coding and successful midbrain self-regulation.
The results of this analysis, showing an association with decreasing temporal difference coding in the dorsolateral prefrontal cortex (dlPFC), inspired a functional connectivity analysis. Specifically, we investigated the functional impact of temporal difference coding in dlPFC on the SN/VTA using a psychophysiological interaction analysis using the gPPI v13 Toolbox92 based on the MNI coordinate of dlPFC (x = 40, y = 10, z = 38) with a 5 mm sphere as seed region. We added activity from this seed region as physiological regressor to the original GLM and interacted it with both the IMAGINE_REWARD and REST regressors to form interaction regressors. Functional connectivity was calculated by contrasting the interaction terms IMAGINE_REWARD-REST between second and first neurofeedback training run. We then correlated this contrast with midbrain DRT. The results were focused to the SN/VTA region as target. For statistical maps, we used a whole-brain threshold of p < 0.001 (20 voxel extent).
Add-on analysis Dynamic Causal Modelling (DCM)
In addition to functional connectivity, we also investigated task-dependent effective connectivity between SN/VTA and dlPFC during the second neurofeedback training run related to the upregulation of the dopaminergic midbrain. The full DCM analysis is reported in Supplementary Note 3. The results suggest that successful self-regulation appears to benefit from some inhibitory modulation of SN/VTA by prefrontal cortex.
As an alternative form of learning, we tested for non-associative mechanisms and performed two additional analyses testing whether (de-)sensitization can explain how the repetition of mental strategies relates to midbrain DRT. In particular, we assessed a linear parametric modulator for each timestep in the training sessions, both increasing or decreasing. The analysis of linear increase (sensitization) itself revealed a positive correlation with midbrain DRT within the left hippocampus (x = −35, y = −26, z = −7, uncorrected p < 0.001, cluster size = 30). In contrast, a linear decrease (desensitization) revealed a negative correlation with left ACC (x = 43, y = 0, z = 14, uncorrected p < 0.001, cluster size = 21). However, because the SN/VTA is more intimately related to associative than non-associative learning, we do not consider this analysis further.
Relation between non-midbrain DRT and reward sensitivity in the MID Task (aim 3)
To address the third aim of the study, we investigated the relationship between reward processing in the MID task and the capacity to successfully regulate the SN/VTA in the transfer session of the neurofeedback experiment at the whole-brain level. In particular, we considered two contrasts in the MID task, (1) general reward sensitivity, defined as the sum of parametric modulators: small plus large reward; (2) adaptive reward coding, defined as the difference between parametric modulators: small minus large reward. Again, we used correlation analysis at the group level to determine whether these two contrasts are related with individual SN/VTA transfer success (non-midbrain DRT) in the neurofeedback task. Moreover, to assess the commonalities of the neural activities in the two different tasks, we performed a conjunction analysis of contrasts (1), (2) and the correlation of transfer-activity with non-midbrain DRT. For statistical maps, we used a whole-brain threshold of p < 0.001 (20 voxel extent).
Additional behavioral measurements
We analysed available information on external behavioral scores that might explain differences in the individual transfer success.
All participants were introduced to five example strategies (see section about neurofeedback paradigm) they could use to upregulate brain activity but were also free to use their own strategies. At the end of the experiment, participants filled in a custom-made questionnaire on the strategies they used. To compare strategies between the groups, we used a χ2-test that assessed differences in the distribution of strategy usage. We did not observe any significant group differences in strategy use (p = 0.9), and therefore did not consider this measurement in any further analysis (Supplementary Table 8).
To control for the possibility that individual differences in behavior and personality were associated with individual differences in DRT, Study 2 measured: (1) Smoking status in number of cigarettes per day; (2) verbal IQ as determined by the Multiple Word Test (MWT93); (3) Positive and Negative Affect Score (PANAS) in the German version94; (4) attentional and nonplanning subscores of the Barratt Impulsivity Scale in the German version95. None of these variables correlated significantly with midbrain DRT in a Pearson’s correlation test (all p > 0.5) and the correlations reported in the results section were robust to including the variables as covariates of no interest.
Statistics and reproducibility
For all of the reported analyses, we used the toolbox SPM 12 (v6906) within Matlab R2016b. All figures were created using bspmview v.2016110896 and ggplot2 within R 3.4.1. All group-level analyses included an additional covariate for the dataset to account for potential global signal differences between studies. To estimate the robustness of our results, we performed bootstrapping analyses for all correlation analyses and summarized these in Supplementary Table 10. To validate the findings of this secondary data analyses, we performed a confirmatory analysis separating the two self-regulation conditions IMAGINE_REWARD and REST over the course of the study. Because the participants were instructed to perform mental calculations during REST, which is an active task, the results of these analyses indicate that the results of the main analysis involving the contrast (IMAGINE_REWARD-REST) are not simply driven by a decrease during REST (Supplementary Note 4).
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Second level statistical map data supporting the findings of this study are available at https://identifiers.org/neurovault.collection:12684. All other source data, such as data extractions from regions of interest and mask files, for this paper and the Supplemental Material are provided with this paper in the file supplementary_data_1.zip. A reporting summary for this Article is available as a Supplementary Information file.
Matlab and R Code supporting this publication is publicly available at: https://github.com/lydiatgit/NFLearning_SNVTA_PublicRepo (https://zenodo.org/badge/latestdoi/391012761)97.
Schultz, W. Predictive Reward Signal of Dopamine Neurons. J. Neurophysiol. 80, 1–27 (1998).
Schultz, W. Dopamine reward prediction error coding. Dialogues Clin. Neurosci. 18, 23–32 (2016).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science (1979) 275, 1593–1599 (1997).
Tobler, P. N., Fletcher, P. C., Bullmore, E. T. & Schultz, W. Learning-Related Human Brain Activations Reflecting Individual Finances. Neuron 54, 167–175 (2007).
Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Dopamine in Motivational Control: Rewarding, Aversive, and Alerting. Neuron 68, 815–834 (2010).
Wise, R. A. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483–494 (2004).
Friston, K. et al. The anatomy of choice: Dopamine and decision-making. Philosophical Transactions of the Royal Society B: Biol. Sci. 369, (2014).
Huys, Q. J. M., Tobler, P. N., Hasler, G. & Flagel, S. B. The role of learning-related dopamine signals in addiction vulnerability. Prog. Brain Res. 211, 31–77 (2014).
Deserno, L., Schlagenhauf, F. & Heinz, A. Striatal dopamine, reward, and decision making in schizophrenia. Dialogues Clin. Neurosci. 18, 77–89 (2016).
Maia, T. V. & Frank, M. J. An Integrative Perspective on the Role of Dopamine in Schizophrenia. Biol. Psychiatry. 81, 52–66 (2017).
Meder, D., Herz, D. M., Rowe, J. B., Lehéricy, S. & Siebner, H. R. The role of dopamine in the brain - lessons learned from Parkinson’s disease. NeuroImage 190, 79–93 (2019).
Sulzer, J. et al. Neurofeedback-mediated self-regulation of the dopaminergic midbrain. Neuroimage 75, 176–184 (2013).
MacInnes, J. J., Dickerson, K. C., Chen, N. K. & Adcock, R. A. Cognitive Neurostimulation: Learning to Volitionally Sustain Ventral Tegmental Area Activation. Neuron 89, 1331–1342 (2016).
Kirschner, M. et al. Self-regulation of the dopaminergic reward circuit in cocaine users with mental imagery and neurofeedback. EBioMedicine 37, 489–498 (2018).
Klein, M. O. et al. Dopamine: Functions, Signaling, and Association with Neurological Diseases. Cell. Mol. Neurobiol. 39, 31–59 (2019).
Alkoby, O., Abu-Rmileh, A., Shriki, O. & Todder, D. Can We Predict Who Will Respond to Neurofeedback? A Review of the Inefficacy Problem and Existing Predictors for Successful EEG Neurofeedback Learning. Neuroscience 378, 155–164 (2018).
Sitaram, R. et al. Closed-loop brain training: the science of neurofeedback. Nat. Rev. Neurosci. 18, 86–100 (2016).
Wu, J. et al. Cortical control of VTA function and influence on nicotine reward. Biochemical Pharmacol. 86, 1173–1180 (2013).
Frankle, W. G., Laruelle, M. & Haber, S. N. Prefrontal cortical projections to the midbrain in primates: Evidence for a sparse connection. Neuropsychopharmacology 31, 1627–1636 (2006).
Gao, M. et al. Functional Coupling between the Prefrontal Cortex and Dopamine Neurons in the Ventral Tegmental Area. J. Neurosci. 27, 5414–5421 (2007).
Sesack, S. R., Carr, D. B., Omelchenko, N. & Pinto, A. Anatomical Substrates for Glutamate-Dopamine Interactions: Evidence for Specificity of Connections and Extrasynaptic Actions. in Annals of the New York Academy of Sciences. 1003 36–52 (John Wiley & Sons, Ltd (10.1111), 2003).
Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590–1597 (2011).
Ferenczi, E. A. et al. Prefrontal cortical regulation of brainwide circuit dynamics and reward-related behavior. Science 351, aac9698 (2016).
Ballard, I. C. et al. Dorsolateral Prefrontal Cortex Drives Mesolimbic Dopaminergic Regions to Initiate Motivated Behavior. J. Neurosci. 31, 10340–10346 (2011).
Birbaumer, N., Ruiz, S. & Sitaram, R. Learned regulation of brain metabolism. Trends Cogn. Sci. 17, 295–302 (2013).
Oblak, E. F., Lewis-Peacock, J. A. & Sulzer, J. S. Self-regulation strategy, feedback timing and hemodynamic properties modulate learning in a simulated fMRI neurofeedback environment. PLoS Computational Biol. 13, e1005681 (2017).
Radua, J., Stoica, T., Scheinost, D., Pittenger, C. & Hampson, M. Neural Correlates of Success and Failure Signals During Neurofeedback Learning. Neuroscience 378, 11–21 (2018).
Ferenczi, E. A. et al. Prefrontal cortical regulation of brainwide circuit dynamics and reward-related behavior. Science 351, aac9698–aac9698 (2016).
Tsai, H. C. et al. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science324, 1080–1084 (2009).
D’Ardenne, K., McClure, S. M., Nystrom, L. E. & Cohen, J. D. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319, 1264–1267 (2008).
Zaghloul, K. A. et al. Human substantia nigra neurons encode unexpected financial rewards. Science 323, 1496–1499 (2009).
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Keiflin, R., Pribut, H. J., Shah, N. B. & Janak, P. H. Ventral Tegmental Dopamine Neurons Participate in Reward Identity Predictions. Curr. Biol. 29, 93–103.e3 (2019).
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–5 (2005).
Kirschner, M. et al. Deficits in context-dependent adaptive coding in early psychosis and healthy individuals with schizotypal personality traits. Brain 141, 2806–2819 (2018).
Niendam, T. A. et al. Meta-analytic evidence for a superordinate cognitive control network subserving diverse executive functions. Cogn. Affect Behav. Neurosci. 12, 241–68 (2012).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (A Bradford Book, 2018).
Chase, H. W., Kumar, P., Eickhoff, S. B. & Dombrovski, A. Y. Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis. Cogn Affect Behav Neurosci. 15, 435–459 (2015).
O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H. & Dolan, R. J. Temporal Difference Models and Reward-Related Learning in the Human Brain. Neuron 38, 329–337 (2003).
Sulzer, J. et al. Real-time fMRI neurofeedback: Progress and challenges. Neuroimage 76, 386–399 (2013).
Kirschner, M. et al. Self-regulation of the dopaminergic reward circuit in cocaine users with mental imagery and neurofeedback. EBioMedicine 37, 489–498 (2018).
Greer, S. M., Trujillo, A. J., Glover, G. H. & Knutson, B. Control of nucleus accumbens activity with neurofeedback. Neuroimage 96, 237–244 (2014).
Zoefel, B., Huster, R. J. & Herrmann, C. S. Neurofeedback training of the upper alpha frequency band in EEG improves cognitive performance. Neuroimage 54, 1427–1431 (2011).
Hanslmayr, S., Sauseng, P., Doppelmayr, M., Schabus, M. & Klimesch, W. Increasing Individual Upper Alpha Power by Neurofeedback Improves Cognitive Performance in Human Subjects 1. Appl. Psychophysiol. Biofeedback. 30,1–10 (2005).
Fuchs, T., Birbaumer, N., Lutzenberger, W., Gruzelier, J. H. & Kaiser, J. Neurofeedback treatment for attention-deficit/hyperactivity disorder in children: A comparison with methylphenidate. Appl. Psychophysiol. Biofeedback 28, 1–12 (2003).
Enriquez-Geppert, S. et al. Modulation of frontal-midline theta by neurofeedback. Biol. Psychol. 95, 59–69 (2014).
Parro, C., Dixon, M. L. & Christoff, K. The neural basis of motivational influences on cognitive control. Hum. Brain Mapp. 39, 5097–5111 (2018).
Marco-Pallarés, J., Müller, S. V. & Münte, T. F. Learning by doing: An fMRI study of feedback-related brain activations. NeuroReport 18, 1423–1426 (2007).
Emmert, K. et al. Meta-analysis of real-time fMRI neurofeedback studies using individual participant data: How is brain regulation mediated? Neuroimage 124, 806–812 (2016).
Tenison, C., Fincham, J. M. & Anderson, J. R. Phases of learning: How skill acquisition impacts cognitive processing. Cogn. Psychol. 87, 1–28 (2016).
Friedman, N. P. & Miyake, A. Unity and diversity of executive functions: Individual differences as a window on cognitive structure. Cortex 86, 186–204 (2017).
Braver, T. S., Cole, M. W. & Yarkoni, T. Vive les differences! Individual variation in neural mechanisms of executive control. Curr. Opin. Neurobiol. 20, 242–250 (2010).
Buhle, J. T. et al. Cognitive reappraisal of emotion: A meta-analysis of human neuroimaging studies. Cereb. Cortex 24, 2981–2990 (2014).
Kohn, N. et al. Neural network of cognitive emotion regulation - An ALE meta-analysis and MACM analysis. Neuroimage 87, 345–355 (2014).
Hendricks, M. A. & Buchanan, T. W. Individual differences in cognitive control processes and their relationship to emotion regulation. Cognition Emot. 30, 912–924 (2016).
Arnsten, A. F. T., Wang, M. & Paspalas, C. D. Dopamine’s Actions in Primate Prefrontal Cortex: Challenges for Treating Cognitive Disorders. Pharmacol. Rev. 67, 681–696 (2015).
Beck, A. T. The current state of cognitive therapy: A 40-year retrospective. Arch. Gen. Psychiatry 62, 953–959 (2005).
Lynch, T. R., Trost, W. T., Salsman, N. & Linehan, M. M. Dialectical Behavior Therapy for Borderline Personality Disorder. Annu. Rev. Clin. Psychol. 3, 181–205 (2007).
Bateman, A. & Fonagy, P. Mentalization based treatment for borderline personality disorder. World Psychiatry. 9, 11–5 (2010).
Maroda, K. J. Psychodynamic Techniques: Working with Emotion in the Therapeutic Relationship. Journal of Phenomenological Psychology https://doi.org/10.1163/156916211x567505. (Guilford Press, 2010).
Have-de Labije, J. & Neborsky, R. Mastering intensive short-term dynamic psychotherapy: a roadmap to the unconscious. pp. 416 (Karnac Books, 2012).
Young, K. D. et al. Real-Time Functional Magnetic Resonance Imaging Amygdala Neurofeedback Changes Positive Information Processing in Major Depressive Disorder. Biol. Psychiiatry. 82, 578–586 (2017).
Koob, G. F. & Volkow, N. D. Neurocircuitry of addiction. Neuropsychopharmacology 35, 217–38 (2010).
Holmes, A. J., Hollinshead, M. O., Roffman, J. L., Smoller, J. W. & Buckner, R. L. Individual Differences in Cognitive Control Circuit Anatomy Link Sensation Seeking, Impulsivity, and Substance Use. J. Neurosci. 36, 4038–4049 (2016).
George, O. & Koob, G. F. Individual differences in prefrontal cortex function and the transition from drug use to drug dependence. Neurosci. Biobehav. Rev. 35, 232–247 (2010).
Pearce, J. Animal Learning and Cognition: An Introduction. 3rd edition, 432 (Psychology Press, 2008).
Jo, Y. S. & Mizumori, S. J. Prefrontal Regulation of Neuronal Activity in the Ventral Tegmental Area. Cereb. Cortex. 26, 4057–4068 (2016).
Schenk, L. A., Sprenger, C., Onat, S., Colloca, L. & Büchel, C. Suppression of Striatal Prediction Errors by the Prefrontal Cortex in Placebo Hypoalgesia. J. Neurosci. 37, 9715–9723 (2017).
Weber, S. C., Kahnt, T., Quednow, B. B. & Tobler, P. N. Frontostriatal pathways gate processing of behaviorally relevant reward dimensions. PLoS Biol. 16, e2005722 (2018).
Chatham, C. H., Frank, M. J. & Badre, D. Corticostriatal output gating during selection from working memory. Neuron 81, 930–942 (2014).
Sesack, S. R. & Pickel, V. M. Prefrontal cortical efferents in the rat synapse on unlabeled neuronal targets of catecholamine terminals in the nucleus accumbens septi and on dopamine neurons in the ventral tegmental area. J. Comp. Neurol. 320, 145–160 (1992).
Sesack, S. R. & Carr, D. B. Selective prefrontal cortex inputs to dopamine cells: Implications for schizophrenia. Physiol. Behav. 77, 513–517 (2002).
Parker, J. G., Beutler, L. R. & Palmiter, R. D. The contribution of NMDA receptor signaling in the corticobasal ganglia reward network to appetitive pavlovian learning. J. Neurosci. 31, 11362–11369 (2011).
Jo, Y. S., Lee, J. & Mizumori, S. J. Y. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. J. Neurosci. 33, 8159–8171 (2013).
Beier, K. T. et al. Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell 162, 622–634 (2015).
Tik, M. et al. Ultra-high-field fMRI insights on insight: Neural correlates of the Aha!-moment. Hum. Brain Mapp. 39, 3241–3252 (2018).
Hahn, A. et al. Individual Diversity of Functional Brain Network Economy. Brain Connectivity. 5, 156–165 (2014).
Wark, B., Lundstrom, B. N. & Fairhall, A. Sensory adaptation. Curr. Opin. Neurobiol. 17, 423–9 (2007).
Kirschner, M. et al. Deficits in context-dependent adaptive coding in early psychosis and healthy individuals with schizotypal personality traits. Brain 141, 2806–2819 (2018).
Kirschner, M. et al. Ventral striatal hypoactivation is associated with apathy but not diminished expression in patients with schizophrenia. J. Psychiatry Neurosci. 41, 152–161 (2016).
Trutti, A. C., Mulder, M. J., Hommel, B. & Forstmann, B. U. Functional neuroanatomical review of the ventral tegmental area. NeuroImage 191, 258–268 (2019).
Düzel, E. et al. Functional imaging of the human dopaminergic midbrain. Trends Neurosci. 32, 321–328 (2009).
Sorger, B., Scharnowski, F., Linden, D. E. J., Hampson, M. & Young, K. D. Control freaks: Towards optimal selection of control conditions for fMRI neurofeedback studies. Neuroimage 186, 256–265 (2019).
Hellrung, L. et al. Intermittent compared to continuous real-time fMRI neurofeedback boosts control over amygdala activation. Neuroimage 166,198-208 (2018).
Heunis, S. et al. Quality and denoising in real‐time functional magnetic resonance imaging neurofeedback: A methods review. Human Brain Mapping hbm. 25010. https://doi.org/10.1002/hbm.25010 (2020).
Miyapuram, K. P., Tobler, P. N., Gregorios-Pippas, L. & Schultz, W. BOLD responses in reward regions to hypothetical and imaginary monetary rewards. Neuroimage 59, 1692–1699 (2012).
Murty, V. P. et al. Resting state networks distinguish human ventral tegmental area from substantia nigra. Neuroimage 100, 580–9 (2014).
Simon, J. J. et al. Reward System Dysfunction as a Neural Substrate of Symptom Expression Across the General Population and Patients with Schizophrenia. Schizophrenia Bull. 41, 1370–1378 (2015).
Sladky, R. et al. Slice-timing effects and their correction in functional MRI. Neuroimage 58, 588–594 (2011).
Sladky, R. et al. High-resolution functional MRI of the human amygdala at 7 T. Eur. J. Radiol. 82, 728–733 (2013).
Weissenbacher, A. et al. Correlations and anticorrelations in resting-state functional connectivity MRI: A quantitative comparison of preprocessing strategies. Neuroimage 47, 1408–1416 (2009).
McLaren, D. G., Ries, M. L., Xu, G. & Johnson, S. C. A generalized form of context-dependent psychophysiological interactions (gPPI): A comparison to standard approaches. Neuroimage 61, 1277–1286 (2012).
Lehrl, S. Mehrfachwahl-Wortschatz-Intelligenztest MWT-B. (Spitta Verlag, Balingen, 2005).
Krohne, H. W., Egloff, B., Kohlmann, C.-W. & Tausch, A. Untersuchung mit einer deutschen Form der Positive and Negative Affect Schedule (PANAS). Diagnostica 42, 139–156 (1996).
Preuss, U. W. et al. Psychometrische evaluation der deutschsprachigen version der Barratt-Impulsiveness-Skala. Nervenarzt 79, 305–319 (2008).
Spunt, B. Spunt/Bspmview: Bspmview V. 2016 1108 https://doi.org/10.5281/ZENODO.168074 (2016).
Hellrung, L. NFLearning_SNVTA_PublicRepo. https://doi.org/10.5281/zenodo.6772232 (2022).
Murty, V. P. et al. Resting state networks distinguish human ventral tegmental area from substantia nigra. Neuroimage 100, 580–589 (2014).
The authors would like to thank Silvia Maier and Stephan Nebe for comments on previous versions of the manuscript and fruitful discussions. This project was supported by the European Union’s Horizon 2020 research and innovation program under the Grant Agreement No 794395 (to LH) and grant 100014_165884 and 10001C_188878 from the Swiss National Science Foundation (to PNT). MK received grant support from the National Bank Fellowship (McGill) and Swiss National Science Foundation (P2SKP3_178175).
The authors declare no competing interests.
Peer review information
Communications Biology thanks Mai-Anh Vu, Kymberly Young and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: George Inglis. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hellrung, L., Kirschner, M., Sulzer, J. et al. Analysis of individual differences in neurofeedback training illuminates successful self-regulation of the dopaminergic midbrain. Commun Biol 5, 845 (2022). https://doi.org/10.1038/s42003-022-03756-4