Post-response βγ power predicts the degree of choice-based learning in internally guided decision-making

Choosing an option increases a person’s preference for that option. This phenomenon, called choice-based learning (CBL), has been investigated separately in the contexts of internally guided decision-making (IDM, e.g., preference judgment), for which no objectively correct answer exists, and externally guided decision making (EDM, e.g., perceptual decision making), for which one objectively correct answer exists. For the present study, we compared decision making of these two types to examine differences of underlying neural processes of CBL. As IDM and EDM tasks, occupation preference judgment and salary judgment were used, respectively. To compare CBL for the two types of decision making, we developed a novel measurement of CBL: decision consistency. When CBL occurs, decision consistency is higher in the last-half trials than in first-half trials. Electroencephalography (EEG) data have demonstrated that the change of decision consistency is positively correlated with the fronto-central beta–gamma power after response in the first-half trials for IDM, but not for EDM. Those results demonstrate for the first time the difference of CBL between IDM and EDM. The fronto-central beta–gamma power is expected to reflect a key process of CBL, specifically for IDM.

is not changed when no preference change exists. Consequently, the change of decision consistency is unaffected by the difficulty pointed out by Chen and Risen 1 .

Simulation 2
Moreover, we conducted an additional simulation (Simulation 2) to confirm that the increase of decision consistency is observed when choice-based learning (CBL) occurs. We used a simple CBL model in which the value of the chosen item ( ℎ ( )) was increased (equation 1), whereas the value of the rejected item( ( )) was decreased (equation 2) as follows (i is the index of the item; t is the index of the trial). Therein, is the learning rate that determines how much the model updates the item value. was varied from 0 to 0.7 at intervals of 0.05. The upper limit (0.7) was determined based on the 95% confidence interval (0.18-0.68) of the learning rate parameter estimate of the previous studies of CBL 3 , which used computational model fitting of behavioral data. The range of the item value ( ) is restricted to values between 0 and 1. is multiplied by (1 − ( )) to restrict ( + 1) between 0 and 1, after updating the item value. We generated hypothetical data almost identically to the method described for Simulation 1, except when the item value was between 0 and 1 (the range was approximately 1/10 of Simulation 1). The true preference was assigned randomly by sampling from a normal distribution with mean 0.55 and SD 0. Figure S1(c) presents mean results of 10,000 simulations by applying both equations 1 and 2. An increase of decision consistencies was observed when the learning rate ( ) was neither 0 nor 0.7. When the learning rate was 0.7 and the decision noise was greater than 0.15, the change of decision consistency was around 0. These results indicate that the decision consistency is increased when CBL occurs without too high a learning rate and decision noise.
Figures S1(d) and S1(e) respectively present mean results of 10,000 simulations by applying only equations 1 or 2. In both results, the increase of decision consistency was observed except for the case in which the learning rate was 0. These results confirmed that the decision consistency increased when CBL occurs.

Controls of stimuli
To confirm that the averaged annual salary does not covariate with the used-frequency of each occupation-related term, Google (http://www.google.jp) web page hits (collected on 10 July 2013) were used to estimate the used-frequency of each term, as in previous studies. [4][5][6] No correlation was found between the average annual salary and used-frequency (Pearson's r=0.14). A lack of correlation was confirmed between the average annual salary and the word length (Pearson's r=-0.12).

Trial-based decision consistency
To observe the change of decision consistency at an individual trial level, we calculated the index of the CBL at the trial level (trial-based decision consistency; Figure S2). The trial-based decision consistency score represents the rate of consistently chosen or rejected stimuli for each pair of consecutive trials including the same stimuli. To calculate the index, the consistently chosen or rejected stimuli were counted for each pair of consecutive trials including the same stimuli (e.g., first time trial including "Lawyer" and second time trial including "Lawyer"). For each pair of consecutive trials (e.g., first time -second time, second time -third time…), the stimuli that were consistently chosen or rejected were counted. Then that number was converted to a rate of consistently chosen or rejected stimuli (i.e., trial-based decision consistency) by dividing the total number of stimuli (i.e., 28). Ratings for respective occupations and consistency between the pre-rating and decision making task (pre-rating -decision consistency) Before judgment tasks, participants were asked to rate the two dimensions (preference and salary) using a computer-based visual analog scale for all occupation words. The following questions and scales were used for the ratings: Preference ("How much would you like to do the job?" 1 = not at all, 100 = very much) and salary ("How much pay is given for the following occupations?; 1 = very little, 100 = very much). The order used to rate these items was randomized across participants.
Pre-ratings for preference and salary were used to confirm whether participants' criteria for internally guided decision making (IDM) and externally guided decision making (EDM) differed.
For this, we counted trials that are consistent between the rating value of each word stimulus and the judgment of the decision-making task. For example, for a case in which the participant rated occupation A as 100 (very much) and occupation B as 1 (not at all) regarding one's preference, and chose occupation A (100) compared to B (1) in the IDM (occupational preference judgment), we counted the trial as consistent. The sum of the number of consistent trials was divided by the total number of trials in each task. This index (pre-rating -decision consistency) represents how often participants' decisions were consistent with pre-ratings of the same dimension (i.e., preference or salary).

Average annual salary -decision consistency
Average annual salary -decision consistency represents how often participants' decisions were consistent with actual average annual salary based on a statistical survey by the Ministry of Health, Labour and Welfare of Japan. This index was calculated in the same way as the pre-ratingdecision consistency. However, instead of the subjective pre-rating, we used data of a statistical survey of average annual salaries. were used as the reference during on-line recording, all electrodes were later re-referenced to averaged earlobes. Blink and eye movements were monitored with electrodes above and below the left eye (vertical electrooculogram, VEOG) and at the right and left outer canthi of the eyes (horizontal electrooculogram, HEOG). The electrode impedance was maintained as less than 5 kΩ.

EEG recordings
The EEG and EOG signals were amplified with a bandpass of 0.0159-120 Hz, and were digitized at a 1,000 Hz sampling rate using an EEG recorder (EEG-1100; Nihon Kohden Corp., Tokyo, Japan).

Artifact rejection from EEG data
Epochs with irregular noise were identified and rejected using a computer algorithm and inferences from visual inspection. 7 Typical physiological artifacts such as eye blinks, eye movement, and muscle potentials were retained for the following independent component analysis (ICA).
Extended infomax ICAs were conducted to obtain 32 ICs from response-locked epochs in each participant. An equivalent current dipole was estimated for each IC (DIPFIT 2.2, EEGLAB plug-in using Fieldtrip toolbox). ICs representing typical physiological artifacts and electrode artifacts were identified by visual inspection of their time course data, multi-trial event-related potential (ERP) image plots, the power spectrum, scalp topography, and dipole. On average, 9.38 ICs were rejected from each participant's data. The remaining ICs were back-projected onto the scalp electrodes to obtain artifact-free EEG data.

Statistical analysis of group averaged ERSP
Regarding group averaged ERSP, we conducted the following comparisons of three types.
First, we performed sample t-tests for ERSP in first and last-half trials of each task to examine whether the increased beta-gamma power after response was observed in each task and each epoch (i.e., the first-half or the last-half trials). Second, we conducted paired t-tests to compare ERSPs in first and last-half trials of each task to examine whether the beta-gamma power was altered with trial progress. Third, we conducted paired t-tests to compare ERSPs in IDM (preference judgment) and EDM (salary judgment) tasks of each epoch.
We performed a cluster-based permutation test 8 for each t-test to avoid multiple comparisons in the large time-frequency space. In step 1, we calculated the t-value for each pixel datum of the time-frequency window (-200 ms -600 ms, 2-60 Hz) at FCz. Those were thresholded using uncorrected parametric p-values (p<0.05).
Step 2, the bwconncomp Matlab function was applied to identify clusters in the thresholded map. The sum of the t-value in each cluster was calculated.
Step 3, to generate the probability distribution of the sum of t-value under the null hypothesis, 2,000 iterations of the following three steps were conducted: First, the condition label (e.g., first vs. last trials) was shuffled randomly. Second, as with the steps 1 and 2 described above, we calculated the t-value for each pixel data. Those were thresholded using uncorrected parametric p-value (p<0.05). Third, we collected the largest sum of the absolute t-value in the cluster. The distribution generated by the iterations was used to calculate the critical value.

Statistical analysis of correlation analyses
As with the case with group-averaged ERSP comparison (t-tests), we performed a clusterbased permutation test 8 to avoid issues of multiple comparisons in the large time-frequency space.
The procedure was almost identical to that of the case of t-tests: the differences were that we calculated Pearson's correlation coefficients (r-values) instead of t-values. Moreover, the mapping of a variable to participants was shuffled randomly (instead of the condition label) for the 2,000 iterations. For this permutation test, the frequency range was limited to the beta-gamma band (14-60 Hz).

Phase-Amplitude cross-frequency coupling
For further exploratory analysis for the physiological feature of beta-gamma power after response in the first-half trials of the IDM (preference) task, we computed phase-amplitude couplings between the beta-gamma (25-40 Hz) power and theta-alpha phase (

Trial-based decision consistency score
Results of this index are presented in Figure S3 for preference judgment, the choice would be affected if the more preferred or less preferred occupation was paired with "Lawyer", whether the "Lawyer" is chosen or rejected at the next opportunity. The combination of the two option was determined randomly across trials. Therefore, if no CBL exists, no change of epoch-based decision consistency would be observed (presented in Fig.   1s(a) and (b)). However, that effect from the combination of the two options functions as noise to decrease the statistical power to observe decision consistency (both in the epoch-based and trialbased decision consistency), and trial-based decision consistency would be affected strongly by that noise because fewer trials were used to calculate one index value.

Pre-rating decision consistency
Pre-rating decision consistency represents how often participants' decisions were consistent with pre-ratings of the same dimension (i.e., occupational preference or salary). The results of this index (see Fig. S3 Post-hoc tests of the interaction revealed that the index was higher in the IDM (preference judgment) task than in the EDM (salary judgment) tasks (p<0.001) in the dimensions of preference. In the dimension of salary, the index was higher in the EDM (salary judgment) task than in the IDM (preference judgment) task (p<0.001). These results indicate that participants differentiated the decision criteria for the two decision-making task types. Figure S3(c) presents a summary of mean reaction times (RTs) and shows that RTs were shorter in the last-half trials than in first-half trials both in IDM (preference) and EDM (salary) tasks.

Reaction time
Consistent with this observation, two-way repeated-measures ANOVA (two tasks (IDM, EDM) × two epochs (first and last-half trials)) revealed a significant main effect of epoch  decreased. The pre-stimulus baseline ERSP was calculated using the same setings with responselocked ERSP. Although no significant cluster was found using permutation tests, positive correlations after response were observable in IDM after applying the pre-stimulus baseline (see Fig. S4). In addition, when extracting the averaged power from the time-frequency window of interest (350-500 ms and 25-40 Hz), significant correlation with the change of decision consistency was observed (r=0.46, p<0.05). Although the effect size was decreased, which might result from the contamination of post-response activities of the preceding trial in the pre-stimulus baseline, these results confirm that the post-response beta-gamma activities in the first-half trials correlate with the change of decision consistency in IDM.  Figure S5 presents response-locked ERSP for each task (IDM and EDM tasks) and each epoch (first-half and last-half trials) at FCz, significant clusters from one sample permutation t-tests, and scalp topographies of mean ERSP within the significant cluster. In every four conditions, the permutation one sample t-tests yield significant beta-gamma power increase after around 400 ms (cluster t-value sum= 4098.69, cluster count = 1057, corrected p<0.05 for first-half trials of IDM task; cluster t-value sum= 5565.87, cluster count = 1522, corrected p<0.05 for last-half trials of IDM Figure S5. Response-locked event-related spectral perturbations (ERSP) images at FCz for the firsthalf and the last-half trials in the IDM (preference) and EDM (salary) judgment tasks. Scalp topographies of mean power within each significant cluster are shown on the right side. IDM denotes internally guided decision-making. EDM denotes externally guided decision-making. Figure S6 presents phase-amplitude coupling between the theta-alpha phase and betagamma power around 425 ms after response for each task (IDM and EDM tasks) and each epoch (first-half and last-half trials) at FCz. No significant coupling was found after applying corrected p<0.05. Although conducted correlation analyses were also between the PACz in the first-half trial and the change of decision consistency in the IDM and EDM tasks, no significant correlation was found after applying corrected p<0.05. Figure S6. Results of phase-amplitude coupling around 425 ms after response at FCz for each condition. IDM denotes internally guided decision-making. EDM denotes externally guided decisionmaking.