Dissociable roles of cortical excitation-inhibition balance during patch-leaving versus value-guided decisions

In a dynamic world, it is essential to decide when to leave an exploited resource. Such patch-leaving decisions involve balancing the cost of moving against the gain expected from the alternative patch. This contrasts with value-guided decisions that typically involve maximizing reward by selecting the current best option. Patterns of neuronal activity pertaining to patch-leaving decisions have been reported in dorsal anterior cingulate cortex (dACC), whereas competition via mutual inhibition in ventromedial prefrontal cortex (vmPFC) is thought to underlie value-guided choice. Here, we show that the balance between cortical excitation and inhibition (E/I balance), measured by the ratio of GABA and glutamate concentrations, plays a dissociable role for the two kinds of decisions. Patch-leaving decision behaviour relates to E/I balance in dACC. In contrast, value-guided decision-making relates to E/I balance in vmPFC. These results support mechanistic accounts of value-guided choice and provide evidence for a role of dACC E/I balance in patch-leaving decisions.


2) vmPFC E/I balance relates to weighting of reward information during value-guided choice
In the main text, we report a negative relationship of vmPFC E/I balance and choice accuracy. To further investigate this, we wanted to quantify the degree to which participants' choices were guided by the options' expected values. To this end, we used a logistic regression which is already reported in the main text. Similar to our relationship between E/I balance and choice accuracy reported in the main manuscript, we found that E/I balance in vmPFC was related to the degree to which participants' choices were governed by expected

3) Simultaneous regression of all behavioural parameters of interest against E/I balance in dACC and vmPFC
Some of our dependent variables may be correlated with each other across participants. This is expected since some of the tests investigate parameters that we assume to be driven by a shared underlying mechanism 1 . For instance, consider the case for % correct choices on one hand and the effect of value difference on RT on the other. As can be seen from Supplementary Table 2, there is a negative correlation between these two variables, indicating that the (negative) effect of value difference on RT is most pronounced in participants with high percentage of correct choices. This, however, is exactly what would be mechanistically predicted from models using competition via mutual inhibition: Slowing the decision in the face of a lot of noise (a difficult trial with low value difference) allows for the choice to be dominated by the available evidence, while averaging out (neural) noise over time. To assess the orthogonal contributions of all the different behavioural parameters across both the patch-leaving and value-guided choice phase, we therefore included all of the parameters of interest from both phases (Supplementary Table 1 and 2) into one single regression model and now used either dACC or vmPFC E/I balance as the dependent variable. We still find a significant effect of patch leaving advantage on dACC E/I balance (t20

4) Model Validation: Simulate and Recover
To validate our model fitting routines 2,3 , we generated and recovered data for the model with the lowest BIC (prospect model with andas free parameters). We generated 500 artificial data sets by randomly selecting  and  parameters in the range between 0 and 3. We then recovered these parameters from the artificial data with the same procedure as used for our real participants. We used 1000 random starting points to find the combination of free parameters yielding the minimal negative log likelihood across iterations. All fittings were done for each participant separately. The distance and correlations between recovered parameters and the ground truth parameters across subjects were estimated (Supplementary Figure S4) as well as the correlations between recovered parameters.

Supplementary Figure S4. Overview of Simulated and Recovered Model Parameters: A)
Correlation between true and recovered parameters and a histogram of the difference between true and recovered parameters for our winning model (see methods for model details). B) Correlations between recovered parameters. Source data are provided as a Source Data file.    In an earlier study, we have already reported relationships between vmPFC E/I balance and optimal choice behaviour 4 . In particular, we had reported that high levels of GABA, and low levels of glutamate, respectively, were related to participants' performance on difficult trials (those with low value difference), as measured by the softmax inverse temperature. This finding is exactly predicted by mechanistic models based on competition by mutual inhibition 4 . However, a recent study found that choices were more strongly guided by multiplicative as opposed to additive value computation after administration of the NMDA receptor agonist d-cycloserine to healthy volunteers 5 . Combining values multiplicatively is considered more optimal whereas an additive value integration is potentially less complex. In our own previous data, we found an effect of vmPFC E/I balance on softmax inverse temperature 4 . In this work, however, we had not compared between different models featuring multiplicative versus additive value construction, or a mixture of both. We have therefore reanalyzed our previous data with the same set of models as used in the current study. All magnitudes have been rescaled between 1 and 10 prior to model fitting. We find that a hybrid model with no distortions in value weighting fits the data best. Since the EV hybrid model fits our previous data best, we assessed the relationship between this model's free parameters and E/I balance. One participant had to be excluded because GABA and glutamate could not be successfully detected 4  As reported in the main text, for our present study, we find that a multiplicative SU model fits the data best. However, we did not obtain sufficient model recovery for the choice stochasticity parameter and therefore decided to fix it at the median recovered vale. There are a number of possible reasons for this. First, in the 2012 data, the trials' combination of reward attributes had been specifically optimized (offline) for the value-guided choice task to allow a certain level of difficulty, to control for correlation between chosen and unchosen value, and to incorporate a certain range of no-brainer trials. In contrast, in the current task, reward magnitudes are generated from the chosen patch, a random fraction of which is allocated to the two patches. Small magnitude differences are therefore less likely to occur, which potentially prevents a reliable estimation of the choice stochasticity parameter.

Supplementary
Secondly, in the current task the distortion of reward magnitudes becomes more important since magnitudes can potentially cover a wider range of values that depends on the current patch value, as opposed to a fixed minimum and maximum in the 2012 study.  4 . EV models assume no distortions in value weighting. EU models in reward magnitude weighting, EVPW in reward probabilities and SU in reward probabilities and magnitudes. Add models assume additive value integration, multi models multiplicative value integration and hybrid models a combination of both. Source data are provided as a Source Data file.

Supplementary Note: Exploratory Analysis -Drift Diffusion Modelling of Choice Data
To obtain a formal characterization of the process of evidence accumulation across trials, we fitted a hierarchical drift diffusion model (DDM) 6 9 . We generated 5000 samples for every chain and discarded one half of all samples as burn-in 9 . Every third sample was discarded for thinning, thereby reducing autocorrelations in the chains. To assess model convergence, we inspected the sampled posterior traces, their autocorrelation and the Gelman-Rubin statistics, which compares between and within chain variance 6,11 . for a group level parameter with a distance of > 0.02 from one were defined as non-converged models.

Details of the behavioural task
All stimuli were presented on a grey (RGB: 60, 60, 60) background with a contrast optimized for the MEG recording chamber on a screen in a distance of one meter from the sitting participants. Stimuli were displayed via a projector with a refresh rate of 75 Hz located outside the MEG recording chamber. During patch-leaving, participants were presented with Every time participants were rewarded, the progress bar grew proportional to the obtained magnitude towards a goal state indicated by a golden rectangle (RGB: 184, 134, 11). The goal in the experiment was to reach the goal state as often as possible.