Article | Open

# The neural encoding of information prediction errors during non-instrumental information seeking

• Scientific Reportsvolume 8, Article number: 6134 (2018)
• doi:10.1038/s41598-018-24566-x
Accepted:
Published:

## Abstract

In a dynamic world, accurate beliefs about the environment are vital for survival, and individuals should therefore regularly seek out new information with which to update their beliefs. This aspect of behaviour is not well captured by standard theories of decision making, and the neural mechanisms of information seeking remain unclear. One recent theory posits that valuation of information results from representation of informative stimuli within canonical neural reward-processing circuits, even if that information lacks instrumental use. We investigated this question by recording EEG from twenty-three human participants performing a non-instrumental information-seeking task. In this task, participants could pay a monetary cost to receive advance information about the likelihood of receiving reward in a lottery at the end of each trial. Behavioural results showed that participants were willing to incur considerable monetary costs to acquire early but non-instrumental information. Analysis of the event-related potential elicited by informative cues revealed that the feedback-related negativity independently encoded both an information prediction error and a reward prediction error. These findings are consistent with the hypothesis that information seeking results from processing of information within neural reward circuits, and suggests that information may represent a distinct dimension of valuation in decision making under uncertainty.

## Introduction

Seeking information is an important drive of behaviour, and a key component of effective decision making under uncertainty1. However, normative decision theory, which assumes that the value of information resides in its instrumental utility for acquiring future rewards2,3,4, provides a poor description of information seeking in humans and other animals. In particular, such theories cannot account for findings showing that animals place a positive value on information that resolves uncertainty but which cannot be used to affect future tangible outcomes (termed non-instrumental information). Human participants, for instance, display a clear preference for acquiring non-instrumental information about both aversive and appetitive future events5,6,7 and many species, including humans, exhibit a willingness to sacrifice part of an uncertain future reward in exchange for non-instrumental information about the reward’s likelihood6,8,9,10. These behavioural findings indicate that animals treat information as though it were of intrinsic value (cf. Grant, Kajii and Polak11).

One recent proposal, the ‘common currency’ hypothesis, is that the intrinsic value of information might result from common neural substrates for processing of rewarding and informative stimuli12. Neural recordings from non-human primates have demonstrated that non-instrumental information is encoded within brain regions typically associated with reward processing, such as the dopaminergic midbrain13, lateral habenula12 and orbitofrontal cortex9. Notably, Bromberg-Martin and Hikosaka12 reported that in response to informative stimuli, neurons in macaque lateral habenula encoded both reward prediction errors (RPEs; the signed difference between expected and actual reward) and information prediction errors (IPEs; the signed difference between expected and actual information). Similarly, functional magnetic resonance imaging (fMRI) in humans has revealed that the delivery of information is associated with increased blood-oxygen-level dependent signals within brain regions typically associated with reward processing, such as the striatum14,15. This resemblance suggests a common neural coding scheme for information and primary reward, which might result from mechanisms such as an intrinsic reward value of information12 or boosting of anticipatory utility by reward prediction errors associated with informative stimuli16.

To date, many predictions of the common currency hypothesis of information valuation have not been investigated in humans. To address this question, the present study recorded the electroencephalogram (EEG) from human participants completing a non-instrumental information seeking task, assessing willingness-to-pay for non-instrumental information6. On each trial, a lottery was drawn in which participants had an equal probability of winning (receiving 20 cents) or losing (receiving 0 cents). Prior to the lottery draw, participants could choose to view either an informative stimulus, which imparted early information about the lottery outcome, or a non-informative stimulus, which was perceptually identical to the informative stimulus but imparted no information about the lottery outcome (see task schematic in Fig. 1). Information was imparted in the form of five-card arrays of red and black cards; participants were informed that, should they choose to observe the informative stimulus, the relative proportions of red and black cards would provide information about the outcome of the lottery. Specifically, participants were informed that a majority of red cards would predict a loss in the lottery, whereas a majority of black cards would predict a win in the lottery. In the non-informative stimulus, by contrast, relative proportions of red and black cards were determined at random and therefore provided no information about the outcome of the subsequent lottery. This ensured that the informative and non-informative stimuli were perceptually identical to one another and differed only in terms of the degree to which they imparted information about the subsequent lottery. To assess participants’ willingness to pay for non-instrumental information, a variable cost was associated with viewing the informative stimulus, to be deducted from participants’ winnings in the case of a win outcome only.

The structure of trials in this task was tripartite: first, a choice between an informative stimulus and a non-informative stimulus; second, presentation of cards in the chosen stimulus; third, the presentation of the trial outcome in the form of monetary winnings. In order to investigate the neural correlates of the processing of non-instrumental information, we assessed event-related potentials evoked by the presentation of informative cards. This allowed us to distinguish between the neural correlates of information (cards which increased certainty about the outcome of the lottery, independent of whether certainty pertained to a win or a loss) and the encoding of the likelihood of winning the lottery (red versus black cards). Specifically, we investigated whether IPEs associated with non-instrumental information were encoded in the feedback-related negativity (FRN) component of the event-related potential (ERP). According to one prominent theory, FRN amplitude reflects RPEs following the disinhibition of neurons in anterior cingulate cortex by mesencephalic dopamine neurons17. In support of this contention, it has been shown that FRN amplitudes are greater following negative RPEs than positive RPEs18,19. Indeed, it has been proposed that the FRN could be reconceptualised as a ‘reward positivity’20 encoding the hedonic value of stimuli relative to expectations. Premised upon this dopaminergic RPE model of the FRN, the common currency hypothesis of information valuation therefore predicts that IPEs should also be encoded in the FRN, in a comparable fashion to RPEs. In addition, we also conducted two sets of exploratory analyses to explore the encoding of IPEs and RPEs in two ERP components temporally adjacent to the FRN: the N121, and the late positive potential22 (LPP).

## Results

### Behavioural results

One participant failed an attention check and was therefore excluded from further analysis (see Methods). Behavioural results (Fig. 2) replicated the overall findings of Bennett et al.6. A repeated-measures analysis of variance (ANOVA) revealed that preference for information was modulated by the cost of information (F(1.76, 36.99) = 58.02, p < 0.001, $η p 2$ = 0.42). Participants displayed a strong preference for the informative stimulus when it was available at no cost (t-test against 0.5: t(21) = 16.96, p < 0.001), and a non-negligible preference for this stimulus when it was available at a cost (t-test against zero: t(21) = −3.55, p = 0.01). Preference for information decreased with increasing information cost (single-sample t-test of coefficients from a linear regression against zero: t(21) = −9.98, p < 0.001). There was also considerable inter-individual variability in preference for information as measured by overall proportion of choices to observe the informative stimulus (M = 0.39, range = 0.15 to 1). These results suggest that participants assigned an intrinsic value to the non-instrumental information imparted by the informative stimulus.

### ERP results

We investigated how RPEs and IPEs were encoded in the amplitude of the feedback-related negativity elicited by the presentation of informative cues (both card presentation and trial outcome screens; see Method for further information regarding trial structure). RPEs were calculated as the discrepancy between expected lottery winnings prior to observing the stimulus and actual expected lottery winnings after observing the stimulus.

$RPE=20(Pr ( w i n ) p o s t −Pr ( w i n ) p r i o r )$
(1)

The number 20 in Equation 1 refers to the number of cents associated with a win outcome. Similarly, IPEs were calculated as the difference between the actual information content of a stimulus I and its expected information content I expected :

$IPE=I− I e x p e c t e d$
(2)

Information content was itself calculated as the reduction of belief entropy, as per information theory (see Method for further information regarding the computation of these variables). Analogous to RPEs, a positive IPE occurred upon presentation of stimuli that conveyed more information than expected, and vice versa for negative IPEs. Crucially, the equiprobability of red and black cards meant that IPEs and RPEs were statistically independent of one another by design.

#### Reward prediction errors

We first examined whether the amplitude of the FRN evoked by the presentation of informative cards encoded RPEs, as predicted by a prominent reinforcement learning theory17. In line with previous studies19,23, we analysed FRN amplitudes using a 2 × 5 repeated-measures ANOVA with factors of RPE (positive, negative) and electrode (Fpz, AFz, Fz, FCz, Cz), which revealed a significant main effect of RPE on FRN amplitude (F(1, 14) = 6.09, p = 0.03, $η p 2$ = 0.30), with negative RPEs (M = 3.08, SEM = 0.55) associated with a more negative FRN amplitude compared to positive RPEs (M = 4.49, SEM = 0.55; see Fig. 3A). This indicates that the FRN encoded RPEs in a typical fashion in the present study.

We also observed a significant main effect of electrode on FRN amplitude (F(1.62, 22.61) = 53.66, p < 0.001, $η p 2$ = 0.49); however, this effect did not interact significantly with the effect of RPE (F(1.83, 25.55) = 0.98, p = 0.43).

In our exploratory analyses, we found a significant modulation of N1 amplitude by RPE sign (F(1, 14) = 6.78, p = 0.02, $η p 2$ = 0.33), driven by a larger N1 component in response to negative RPEs (M = −1.81, SEM = 0.47) than to positive RPEs (M = −0.53, SEM = 0.14). There was no significant modulation of LPP amplitude by RPE sign (F(1, 14) = 1.24, p = 0.28).

#### Information prediction errors

As with the RPE analysis, we used a 2 × 5 repeated-measures ANOVA to investigate the effects of IPE (positive, negative) and electrode (Fpz, AFz, Fz, FCz, Cz) on FRN amplitude. Analogous to the effect of RPE, we observed a significant main effect of IPE on FRN amplitude (F(1, 14) = 7.75, p = 0.01, $η p 2$ = 0.36), driven by significantly more negative FRN amplitudes in response to negative IPEs (M = 2.80, SEM = 0.66) than to positive IPEs (M = 4.94, SEM = 0.70; see Fig. 3B). These results indicate that both IPEs and RPEs were encoded by the FRN: negative prediction errors—both RPEs and IPEs—elicited more negative FRN amplitudes relative to positive prediction errors. As for the RPE analysis, we also found a significant main effect of electrode on FRN amplitude (F(1.71, 23.90) = 37.83, p < 0.001, $η p 2$ = 0.43), but no interaction between electrode and RPE (F(4, 56) = 0.19, p = 0.93).

To assess the generality of these findings, we next conducted an additional control analysis to determine whether a similar modulation of FRN amplitudes was observed when zero IPE events were also included in analysis. To this end, a 5 × 3 repeated-measures ANOVA was used to assess the within-participants effects of IPE (positive, negative, zero) and electrode on the amplitude of the FRN. As above, this new analysis revealed a main effect of IPE for the informative stimulus (F(2, 28) = 6.03, p < 0.01, $η p 2$ = 0.15). Consistent with the results of the main analysis, post-hoc paired-sample t-tests with Bonferroni correction for multiple comparisons indicated that this main effect was driven by a significantly more negative FRN for negative IPEs (M = 2.81, SEM = 0.66) than for positive IPEs (M = 4.94, SEM = 0.71; p = 0.04), as well as for negative IPEs relative to zero IPEs (M = 4.28, SEM = 0.59; p = 0.02). ERP waveforms for this analysis are presented in Supplementary Fig. S1.

In our exploratory analyses, we found a significant modulation of N1 amplitude by IPE (F(1, 14) = 4.85, p = 0.045, $η p 2$ = 0.26), driven by a larger N1 component in response to negative IPEs (M = −1.87, SEM = 0.48) than to positive IPEs (M = −0.76, SEM = 0.20). There was no significant modulation of LPP amplitude by IPE sign (F(1, 14) = 3.83, p = 0.07).

#### Amount of information

We also examined whether the absolute amount of information delivered by stimuli was also encoded in the FRN. This involves looking at information independent of expectations; positive information can be defined as becoming more certain of the trial outcome (both more certain of winning and more certain of losing), whereas negative information involves becoming less certain of the trial outcome. It is important to note that amount of information will tend to be positively correlated with the sign of IPEs, since trials with a positive IPE are a subset of all trials with a positive amount of information, and vice versa for negative IPEs. As such, this analysis should be considered an incremental modification of the IPE analysis presented above, rather than a discrete inquiry. In addition, since participants always learned the outcome by the end of the informative stimulus, relatively more cards decreased uncertainty than increased it, meaning that trial numbers were not balanced between positive and negative information. As such, trial numbers were not balanced between positive and negative information in this analysis.

Using a 2 × 5 repeated-measures ANOVA, we assessed the effect of information (positive, negative) and electrode (Fpz, AFz, Fz, FCz, Cz) on FRN amplitude. We found a significant main effect of information (F(1, 14) = 9.59, p < 0.01, $η p 2$ = 0.41), with more negative FRN amplitudes for negative information (greater uncertainty; M = 2.80, SEM = 0.66) relative to positive information (greater certainty; M = 4.35, SEM = 0.45). Again, we observed a significant main effect of electrode (F(1.69, 23.66) = 61.30, p < 0.001, $η p 2$ = 0.48), but no interaction effect between information and electrode (F(2.36, 32.97) = 0.87, p = 0.49). ERP waveforms for this analysis are presented in Supplementary Fig. S2.

In our exploratory analyses, we found a significant modulation of N1 amplitude by amount of information (F(1, 14) = 13.52, p < 0.01, $η p 2$ = 0.49), driven by a larger N1 component in response to negative information (greater uncertainty; M = −1.87, SEM = 0.48) than to positive information (greater certainty; M = −0.71, SEM = 0.18). There was no significant modulation of LPP amplitude by amount of information (F(1, 14) = 3.16, p = 0.10).

#### Non-informative stimuli

As an additional control analysis, we next investigated whether the modulation of the FRN in response to RPEs and IPEs was unique to cards following a decision to view the informative stimulus, or whether similar patterns were observed for cards following a decision to observe the non-informative stimulus. To do this, we calculated pseudo-RPE and IPE variables for cards in each non-informative stimulus as though they had instead been an informative stimulus (because technically, according to the formulae set out in the Method section, IPEs and RPEs were always zero for cards in the non-informative stimulus).

We observed no modulation of the FRN by RPEs for the non-informative stimulus (F(1, 18) = 0.02, p = 0.90; see Supplementary Fig. S3). However, in an in interesting parallel to the effect of IPE in the informative stimulus, we also observed a small effect of IPE (F(1, 18) = 4.58, p = 0.046, $η p 2$ = 0.20), with a more negative FRN for negative pseudo-IPEs (M = 2.82, SEM = 0.67) than for positive pseudo-IPEs (M = 4.08, SEM = 0.42; see Supplementary Fig. S4). There was no significant effect of absolute amount of information on FRN amplitude for non-informative stimuli (F(1, 18) = 0.47, p = 0.50; see Supplementary Fig. S5).

In our exploratory analyses, we observed no significant modulation of N1 amplitude in the non-informative stimulus by IPEs (F(1, 18) = 1.97, p = 0.18), RPEs (F(1, 18) = 0.42, p = 0.53), or amount of information (F(1, 18) = 0.20, p = 0.66). Similarly, we observed no significant modulation of LPP amplitude in the non-informative stimulus by IPEs (F(1, 18) = 1.29, p = 0.27), RPEs (F(1, 18) = 0.93, p = 0.35), or amount of information (F(1, 18) = 0.61, p = 0.45).

#### Outcome screen ERPs

There was no significant difference in FRN amplitudes between win and loss outcomes, either when analyses were conducted on all trials pooled together (F(1,18) = 3.54, p = 0.08), or when analyses were conducted separately for outcome screens following an informative stimulus (F(1, 18) = 3.74, p = 0.07; see Supplementary Fig. S6) and for outcome screens following a non-informative stimulus (F(1, 17) = 0.11, p = 0.74; see Supplementary Fig. S7).

Similarly, in our exploratory analyses, we observed no significant modulation of N1 amplitude by outcome when all trials were pooled (F(1, 18) = 0.002, p = 0.96), or in trials following either informative stimuli (F(1, 18) = 0.07, p = 0.79) or non-informative stimuli (F(1, 17) = 0.82, p = 0.65). Finally, we also no significant modulation of LPP amplitude by outcome when all trials were pooled (F(1, 18) = 1.65, p = 0.22), or in trials following either informative stimuli (F(1, 18) = 1.28, p = 0.27) or non-informative stimuli (F(1, 17) = 0.06, p = 0.81.

#### Source localisation analyses

Finally, in order to assess whether differences in encoding of RPEs and IPEs were likely to reflect differences in the FRN, rather than in other ERP components that might be coactive with the FRN, we conducted an additional source-localisation analysis. In this analysis, we tested whether a candidate cortical generator for the FRN, the anterior cingulate cortex (ACC)24, displayed differential estimated activation as a function of prediction error sign, based on a reconstruction of ACC activation from EEG voltage recorded at the scalp (see Methods for further information).

Consistent with our identification of differences in frontocentral EEG activity as an FRN, we found that reconstructed ACC activity significantly differed as a function of prediction error sign, both for positive reward prediction errors relative to negative reward prediction errors (t(14) = 2.29, p = 0.04) and for positive information prediction errors relative to negative information prediction errors (t(14) = 2.53, p = 0.02). These findings give us further confidence in our accurate identification of an FRN in the present study.

## Discussion

This study used an information seeking task to investigate human participants’ preference for non-instrumental information in decision making under uncertainty. Using EEG, we assessed how both reward prediction errors and information prediction errors were reflected in the feedback-related negativity component of the event-related potential. Behavioural results replicated the overall pattern of findings previously reported by Bennett and colleagues6, consistent with an intrinsic valuation of information (cf. Grant et al.11). That is, participants displayed a clear preference for acquiring non-instrumental information, despite the fact that this information was at times associated with a direct monetary cost. Analyses of the ERP evoked by informative stimuli revealed that RPEs and IPEs were both encoded in a comparable fashion in the amplitude of the FRN component.

ERP analyses showed that the modulation of the FRN during task events that elicited positive and negative IPEs was consistent with FRN modulation by positive versus negative RPEs. The FRN has traditionally been considered to encode correct and incorrect responses in tasks17,25, as well as rewarding outcomes23,26. As such, our ERP analyses show a striking parallel in FRN encoding of informative and rewarding outcomes. This is conceptually consistent with the finding that firing rates of single neurons in primates respond in the same manner to positive/negative IPEs as to positive/negative RPEs12. Since FRN amplitude is thought to be related to dopaminergic projections to the anterior cingulate cortex24, the modulation of the FRN by positive and negative RPEs has been suggested as an index of dopaminergic reward processing17. As such, the finding that IPEs and RPEs were both reflected in a similar fashion in the FRN provides evidence in favour of the common currency hypothesis12, according to which the intrinsic value of information might result from its representation within canonical neural reward-processing circuits. It is important to note that IPEs and RPEs were encoded independently of one another for the task design employed in the present study. A positive RPE—that is, viewing a black card in the informative stimulus—could therefore be associated with either a positive, negative, or null IPE depending on the composition of cards preceding and succeeding the event.

Interestingly, our analyses of ERPs elicited by trial outcome screens revealed no significant modulation of the FRN by wins versus losses, only a non-significant trend. This is inconsistent both with previous FRN research showing that rewarding outcomes modulated the FRN23,26, and also with the modulation of the FRN by reward prediction errors in response to informative stimuli in the present study. These findings are likely to be due to underpowered statistical analyses, since the task employed in the present study the task only included one outcome screen event per trial, compared to five card stimuli per trial. In this light, we consider it likely that an experiment designed specifically to study outcome screens (utilising a higher overall number of trial outcome events) would detect a statistically significant modulation of FRN amplitudes by outcome screens. Alternatively, we note that according to one model of information-seeking behaviour, the value of reward-predictive stimuli may exceed the value of the rewarding outcome itself, as a result of the increase in anticipatory utility associated with positive predictive cues16.

We also conducted two sets of exploratory analyses to investigate the encoding of reward prediction errors and information prediction errors in two ERP components temporally adjacent to the FRN: the N1 and the LPP. Results of these exploratory analyses suggest several interesting lines of inquiry for future research. N1 analyses revealed a modulation of N1 amplitude by the sign of both RPEs and IPEs, with negative prediction errors of both kinds associated with a larger (more negative) N1 component than positive prediction errors. This finding is in line with a large body of literature demonstrating that the N1 component is modulated by the hedonic value and task salience of stimuli21,27,28, and is therefore in line with the broader claims of this paper concerning the reward value of information in decision making under uncertainty. In the case of the LPP, a prolonged positive-going deflection elicited by the presentation of affectively charged stimuli such as emotive images22,29, results of the present study were more circumstantial. We observed a non-significant trend toward an effect of IPE sign on LPP amplitude, which may suggest a differentiation in affective responses to positive versus negative prediction errors. Given the marginality of this statistical result and its status as an exploratory analysis, however, further research is required before any substantive conclusions can be drawn regarding this hypothesis.

Several recent findings have challenged the RPE-FRN model of Holroyd and Coles17. For instance, Talmi, Atkinson and El-Deredy30 found that FRN amplitude, in addition to increasing when reward was unexpectedly withheld, also increased when aversive outcomes were unexpectedly withheld30. Since unexpectedly withheld aversion represents a positive RPE, the Holroyd and Coles17 model predicts the opposite pattern. Similarly, Hauser and colleagues31 reported that the FRN was more strongly associated with the absolute value of RPEs, rather than signed RPEs, and therefore concluded that FRN amplitudes were driven more by surprise than by outcome valence31. The results of the present study may suggest an alternative interpretation of these past findings. Our findings demonstrate that the FRN encodes information as well as reward; this finding cannot be explained as a form of surprise encoding, since black and red cards were equally probable for each card draw, and therefore equally surprising according to standard operationalisations of stimulus-bound surprise32. Rather, it is possible that past findings demonstrating surprise encoding in the FRN may reflect a complex interaction between RPEs and IPEs. The present study, which averaged across positive and negative IPEs when analysing RPEs, and vice versa when analysing IPEs, had insufficient power to investigate the factorial interaction of RPEs and IPEs. This is an important subject for future research, which could thereby investigate whether there is any asymmetry in IPE encoding as a function of RPE sign, or of RPE encoding as a function of IPE sign.

Under the hypotheses set out above, we did not expect to find any effects of prediction errors on the FRN during non-informative task events. As expected, for these events there was no effect of RPEs on FRN amplitude, as well as no effect of the amount of outcome-relevant information. However, we did find a small but significant difference in amplitude of the FRN elicited by non-informative stimuli during the equivalent of positive and negative IPEs, and this effect was in the same direction as that observed in informative stimuli. One possible explanation for this finding is that, although participants did not receive information about the lottery outcome in the non-informative stimulus, this stimulus may have imparted incidental distributional information. That is, the relative proportions of red and black cards in the non-informative stimulus may have allowed participants to update their beliefs regarding the generative binomial rate of card colours. It is noteworthy in this respect that the effect of IPE in the non-informative stimulus was considerably smaller than the effect in the informative stimulus ($η p 2$ of 0.20 compared to 0.36), and that this non-informative stimulus effect was reduced to a non-significant trend when trials with zero IPEs were included in analysis. Alternatively, given the simple and repetitive task design of the present study, another possible explanation for this finding is that participants might also have been unable to suppress tracking the cards ‘as if’ they contained outcome-relevant information. This might have reflected participants’ attempts to assess the generality of the learned associations of card colours. Although we instructed participants that cards in the non-informative stimulus were not predictive of outcomes, participants may nevertheless have attempted to ascertain whether this was truly the case in trials when they chose the non-informative stimulus.

The current study is the first to investigate similarities between information and reward processing in human participants using EEG. Our primary finding, that IPEs and RPEs are both reflected in the FRN, is conceptually consistent with previous studies showing that informative stimuli are encoded in brain regions traditionally associated with reward processing. These include the dopaminergic midbrain and lateral habenula12,13, the orbitofrontal cortex9, regions of the striatum14,15 and anterior insula33. Under the common currency hypothesis, and premised upon the assumption that the FRN denotes a reward positivity20, these findings suggest that acquiring information may be inherently rewarding, regardless of the instrumental use of the information provided. An expected-reward-maximising agent would not give up monetary reward for information that cannot be used to affect task outcomes. However, if information itself has an inherent motivational value, then this value can offset the monetary cost to the participant. It is important to note, however, that we are not able to conclude on the basis of these data alone that midbrain dopaminergic structures or the orbitofrontal cortex played a role in the processing of non-instrumental information. This degree of spatial resolution is beyond the scope of the scalp EEG recordings which we acquired in the present study. Instead, we note that the involvement of these structures in the encoding of information prediction errors has been established elsewhere, using single-neuron studies in macacque monkeys9,12 and fMRI studies in humans14. The primary contribution of the present study is to show that among human participants who displayed a preference for non-instrumental information, an ERP component thought to reflect processing of a neural reward prediction error also encoded an information prediction error, consistent with past findings in monkeys and humans.

Two distinct neural mechanisms have been proposed which can account for this ‘common currency’ of information and reward. Bromberg-Martin and Hikosaka12 posited that the resolution of uncertainty may itself be inherently rewarding, meaning that information has an explicit value unrelated to its instrumental utility for future planning. This explicit value was proposed to manifest in the encoding of IPEs in dopaminergic midbrain neurons. Alternatively, Iigaya and colleagues16 noted that animals awaiting the outcome of a lottery might experience anticipatory utility (dread of expected losses and savouring of expected wins). In such a scenario, Iigaya and colleagues16 proposed that the absolute value of reward prediction errors elicited by informative stimuli might provide an additive boost to this anticipatory utility. Such a mechanism could result in an apparent encoding of IPEs within canonical reward-processing neurons without the necessity of assuming an explicit value of information, since the card transitions that would be associated with increased/decreased utility in this way are the same as those associated with positive/negative information prediction errors (see Fig. 4C). As such, the findings of the present study are consistent with the mechanisms proposed by both Bromberg-Martin and Hikosaka12 and Iigaya et al.16. ERP components recorded at the scalp do not measure the activity of dopaminergic midbrain neurons directly, and even direct measures from these brain regions have proven insufficient to distinguish between these two accounts. As such, both models propose viable candidate physiological mechanisms for the observed findings of the present study. More broadly, the role of anticipatory utility in the valuation of information is an important topic for future study. Theories of anticipatory utility can describe individuals’ preferences over deterministic outcomes34, sequences of positive outcomes35, and two-period decision problems36. In determining the relationship of these theories to human information-seeking behaviour, an important topic for future study is the nature of individuals’ preference for non-instrumental information about losses (rather than gains, as in the present study).

In sum, the present study found that human participants exhibited a clear preference for acquiring non-instrumental information about future outcomes. Moreover, the neural encoding of information prediction displayed striking similarities to patterns of encoding of reward prediction errors. An updated decision theory in which information itself is a dimension of stimuli which contributes to their hedonic value—whether directly, via an explicit valuation of information, or indirectly, via an anticipatory boosting mechanism—may assist in explaining and predicting patterns of decision making in the presence of reducible uncertainty.

## Methods

### Participants

Participants were 23 healthy, right-handed participants (14 female, 9 male) aged between 18 and 32 years of age (M = 23.04, SD = 4.15). Participants completed two sessions of a non-instrumental information seeking task (see Fig. 1): a preliminary behavioural training session, and an EEG testing session. The present study reports behavioural and EEG results solely from the EEG testing session; behavioural results from the preliminary training session were previously reported as Experiment 2 in Bennett et al.6. Participants received monetary compensation of AUD \$10 per session, as well all task winnings up to a maximum of \$15 (M = \$11.48, SD = 1.18). All participants provided written informed consent, and research was conducted in accordance with the Declaration of Helsinki. All study protocols were approved by The University of Melbourne Human Research Ethics Committee (ID 1341084).

### Protocol and apparatus

Before commencing the preliminary behavioural session, participants received verbal and written instruction of the task and were permitted to complete a practice task. Participants were informed that their choice of stimulus on each trial would not affect the likelihood of winning the lottery, and that the probability of winning and losing was equal on each trial.

The task was presented on a Dell P2210 LCD monitor (1680 × 1050 screen resolution; refresh rate 60 Hz) using the Psychophysics Toolbox37. On each trial, the participant chose between an informative and a non-informative stimulus by using the index finger of their right hand to press either the left or the right button of a five-button Cedrus response box. The left-right response mapping was pseudo-randomised across trials. Participants completed 7 blocks of 16 trials each while EEG was recorded, with a total testing duration of approximately 50 minutes.

In order to ensure that participants maintained attention on the chosen stimulus as it was revealed, a small number of trials (approx. 10%) were ‘catch trials’, in which one card in the chosen stimulus was drawn to reveal a white X rather than a red or black card. Participants were instructed to respond to this attention check by pressing any button within 1.5 seconds. Failure to do so resulted in the deduction of \$1 from overall winnings. This ensured that participants did not disengage from the task during stimulus presentation, and attended equally to both stimulus types.

In line with previous research using this task6, it was determined a priori that participants who failed to respond to more than two catch trials would be deemed to have failed an attention check. One participant failed to respond on four catch trials, and was therefore excluded from all further analysis. The remaining participants showed good levels of task engagement as measured by successful responses to catch trials (M = 98.11%, SD = 3.57%).

### EEG data acquisition and preprocessing

EEG data were acquired from 64 Ag/AgCl active scalp electrodes located according to the International 10–20 system. Data were recorded at a sampling rate of 512 Hz using a BioSemi ActiveTwo system using an implicit reference during recording, and were linearly detrended and re-referenced offline to an average of left and right mastoids. The electrooculogram was recorded from two infraorbital electrodes horizontally adjacent to and below the left eye.

EEG data were preprocessed using EEGLAB38, according to a semi-automated preprocessing pipeline39,40. Data were high- and low-pass filtered at 0.1 Hz and 70 Hz respectively, and notch filtered from 45–55 Hz to remove background electrical noise. Data were segmented into epochs from 1000 ms before to 1000 ms after events of interest, and baseline corrected using a 100 ms pre-stimulus baseline. An Independent Component Analysis (ICA) using an infomax algorithm as implemented in EEGLAB was used to identify and remove components of the data related to eyeblink and saccade artefacts. The mean number of components removed per participant was 1.78 (range = 1 to 3). Components were identified by trained observers on the basis of their frontal scalp topography and symmetry in the sagittal plane. Data were inspected before and after eyeblink correction to evaluate the effectiveness of this correction, and any epochs for which eyeblink correction failed were manually excluded from further analysis. Noisy data channels were interpolated using a spline interpolation routine; no interpolated data channels were included in final ERP analyses. Finally, an impartial artefact screening procedure automatically excluded all epochs in which maximum/minimum amplitudes exceeded 200 mV.

### EEG data analysis

#### ERP analyses

ERP analyses were conducted using ERPLAB41. In line with previous research, FRN amplitudes were calculated for each condition as the mean amplitude from 200 to 350 milliseconds post-stimulus at the five fronto-central channels Fpz, AFz, Fz, FCz, and Cz26. These five channels are located above the medial frontal cortex, a candidate generator for the FRN17,23. The decision to use a 200–350 ms analysis window was motivated by identification of the FRN with this time period in several recent reviews24,42. The N1 was quantified as the mean amplitude from 150–200 milliseconds post-stimulus at electrodes Fz, F3, F4, Cz, C3, and C421. The late positive potential was quantified as the mean amplitude from 400–700 milliseconds post-stimulus at electrodes Cz, CPz, CP1, CP2, and Pz22,29.

For all analyses, ANOVA degrees of freedom were adjusted using the Greenhouse-Geisser correction where the assumption of sphericity was violated.

Finally, since behavioural results showed large differences in strategies between individuals, the number of epochs available for different ERP analyses differed between participants. ERP analyses therefore only included data from participants who had at least 20 epochs of each event type under consideration19. See Table S1 in the Supplementary Materials for a summary of the number of participants and trials included in each analysis.

#### Source localisation analyses

Source localization was conducted on averaged FRN waveforms separately for each participant and each condition using the Brainstorm software package43. Activations were calculated using the unconstrained sLORETA method44 as implemented in Brainstorm, based on electrode locations relative to a standard Montreal Neurological Institute structural brain (Colin27) and a forward model based upon a boundary element model (BEM) generated with OpenMEEG software45. Anterior cingulate cortex activation was then calculated based on coordinates from a standard automatic parcellation of the cingulate cortex into ROIs46. Timecourses of source activity for the ACC were estimated as averages across left and right hemisphere ROIs of maximal elementary sources at each timepoint within the FRN analysis window.

### Quantification of computational variables

Epochs were binned for ERP analysis according to three different computational variables: positive/negative RPEs, positive/negative IPEs, and positive/negative information. Each of these variables was calculated according to the difference between win probabilities before and after observation of a card. This quantity was calculated based on the binomial probability that black cards would be in the majority in the informative stimulus, given conditional independence of successive card draws:

$Pr(win|n, n r e q )=1− ∑ k = 0 n r e q − 1 ( n k ) 0.5 n$
(3)

where n is the number of cards remaining to be drawn, and n req is the number of additional black cards required for a majority given n black already drawn:

$n r e q = { 3 − n b l a c k , n b l a c k < 3 0 , n b l a c k > = 3$
(4)

By definition, no information is imparted by cards in the non-informative stimulus, and in this case was always equal to 0.5.

The reward prediction error associated with each card was calculated as the difference between the participant’s expected lottery winnings prior to and following the card draw (see Equation 1).

Following Shannon47, the information content of a stimulus was defined as the entropy difference between posterior and prior beliefs:

$I=H(Pr ( w i n ) p o s t )−H(Pr ( w i n ) p r i o r )$
(5)

With entropy defined as the binary entropy function:

$H(Pr(win))=−Pr(win).lo g 2 (Pr(win))−(1−Pr(win)).lo g 2 (1−Pr(win))$
(6)

Given n and n req , the expected information content of any card prior to its being revealed can therefore be calculated as the average of the amount of information that would be associated with one additional red card and the amount of information that would be associated with one additional black card:

$E[I]= H ( P r ( w i n | n − 1 , n r e q − 1 ) ) + H ( P r ( w i n | n − 1 , n r e q ) ) − 2 H ( P r ( w i n | n , n r e q ) ) 2$
(7)

This allows for IPEs to be calculated as per Equation 2. See Fig. 4 for a schematic overview of card transitions indicating events associated with positive versus negative RPEs and IPEs, and the differences between the two. Positive/negative RPEs indicate an increase in the likelihood of winning/losing the lottery, respectively, whereas positive/negative IPEs indicate that an event conveyed more/less information about the outcome than expected, regardless of whether that outcome was a win or a loss. Given that there was an equal probability of observing a red versus a black card at every point in time (i.e. each state transition in Fig. 4 was equiprobable), positive and negative RPEs and IPEs were therefore independent of one another for the task design used in the present study.

### Data availability

Data is available upon reasonable request from the corresponding author.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Kidd, C. & Hayden, B. Y. The psychology and neuroscience of curiosity. Neuron 88, 449–460 (2015).

2. 2.

Raiffa, H. & Schlaifer. Applied Statistical Decision Theory, (Division of Research, Harvard Business School, Boston, 1961).

3. 3.

Howard, R. A. Information value theory. IEEE Transactions on Syst. Sci. Cybern. 2, 22–26 (1966).

4. 4.

Lawrence, D. B. The Economic Value of Information, (Springer Science & Business Media, 2012).

5. 5.

Lanzetta, J. T. & Driscoll, J. M. Preference for information about an uncertain but unavoidable outcome. J. Pers. Soc. Psychol. 3, 96 (1966).

6. 6.

Bennett, D., Bode, S., Brydevall, M., Warren, H. & Murawski, C. Intrinsic valuation of information in decision making under uncertainty. PLoS Comput. Biol. 12, e1005020 (2016).

7. 7.

Zhu, J.-Q., Xiang, W. & Ludvig, E. A. Information seeking as chasing anticipated prediction errors. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society (2017).

8. 8.

Zentall, T. R. & Stagner, J. Maladaptive choice behaviour by pigeons: an animal analogue and possible mechanism for gambling (sub-optimal human decision-making behaviour). Proc. Royal Soc. Lond. B: Biol. Sci. 278, 1203–1208 (2011).

9. 9.

Blanchard, T. C., Hayden, B. Y. & Bromberg-Martin, E. S. Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 85, 602–614 (2015).

10. 10.

Vasconcelos, M., Monteiro, T. & Kacelnik, A. Irrational choice and the value of information. Sci. Reports 5, srep13874 (2015).

11. 11.

Grant, S., Kajii, A. & Polak, B. Intrinsic preference for information. J. Econ. Theory 83, 233–259 (1998).

12. 12.

Bromberg-Martin, E. S. & Hikosaka, O. Lateral habenula neurons signal errors in the prediction of reward information. Nat. Neurosci. 14, 1209–1216 (2011).

13. 13.

Bromberg-Martin, E. S. & Hikosaka, O. Midbrain dopamine neurons signal preference for advance information about upcoming rewards. Neuron 63, 119–126 (2009).

14. 14.

Kang, M. J. et al. The wick in the candle of learning: Epistemic curiosity activates reward circuitry and enhances memory. Psychol. Sci. 20, 963–973 (2009).

15. 15.

Jepma, M., Verdonschot, R. G., Van Steenbergen, H., Rombouts, S. A. & Nieuwenhuis, S. Neural mechanisms underlying the induction and relief of perceptual curiosity. Front. Behav. Neurosci. 6 (2012).

16. 16.

Iigaya, K., Story, G. W., Kurth-Nelson, Z., Dolan, R. J. & Dayan, P. The modulation of savouring by prediction error and its effects on choice. eLife 5, e13747 (2016).

17. 17.

Holroyd, C. B. & Coles, M. G. The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychol. Rev. 109, 679 (2002).

18. 18.

Cohen, M. X. Neurocomputational mechanisms of reinforcement-guided learning in humans: a review. Cogn. Affect. & Behav. Neurosci. 8, 113–125 (2008).

19. 19.

Hajcak, G., Moser, J. S., Holroyd, C. B. & Simons, R. F. It’s worse than you thought: The feedback negativity and violations of reward prediction in gambling tasks. Psychophysiol. 44, 905–912 (2007).

20. 20.

Holroyd, C. B., Krigolson, O. E. & Lee, S. Reward positivity elicited by predictive cues. Neuroreport 22, 249–252 (2011).

21. 21.

Vogel, E. K. & Luck, S. J. The visual n1 component as an index of a discrimination process. Psychophysiol. 37, 190–203 (2000).

22. 22.

Schupp, H. T. et al. Affective picture processing: the late positive potential is modulated by motivational relevance. Psychophysiol. 37, 257–261 (2000).

23. 23.

Cohen, M. X., Elger, C. E. & Ranganath, C. Reward expectation modulates feedback-related negativity and eeg spectra. Neuroimage 35, 968–978 (2007).

24. 24.

Walsh, M. M. & Anderson, J. R. Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice. Neurosci. & Biobehav. Rev. 36, 1870–1884 (2012).

25. 25.

Miltner, W. H., Braun, C. H. & Coles, M. G. Event-related brain potentials following incorrect feedback in a time-estimation task: Evidence for a “generic” neural system for error detection. J. Cogn. Neurosci. 9, 788–798 (1997).

26. 26.

Hajcak, G., Moser, J. S., Holroyd, C. B. & Simons, R. F. The feedback-related negativity reflects the binary evaluation of good versus bad outcomes. Biol. Psychol. 71, 148–154 (2006).

27. 27.

Potts, G. F., Patel, S. H. & Azzam, P. N. Impact of instructed relevance on the visual erp. Int. J. Psychophysiol. 52, 197–209 (2004).

28. 28.

Mason, L., O’Sullivan, N., Blackburn, M., Bentall, R. & El-Deredy, W. I want it now! neural correlates of hypersensitivity to immediate reward in hypomania. Biol. psychiatry 71, 530–537 (2012).

29. 29.

Hajcak, G., Dunning, J. P. & Foti, D. Motivated and controlled attention to emotion: time-course of the late positive potential. Clin. Neurophysiol. 120, 505–510 (2009).

30. 30.

Talmi, D., Atkinson, R. & El-Deredy, W. The feedback-related negativity signals salience prediction errors, not reward prediction errors. J. Neurosci. 33, 8264–8269 (2013).

31. 31.

Hauser, T. U. et al. The feedback-related negativity revisited: new insights into the localization, meaning and network organization. Neuroimage 84, 159–168 (2014).

32. 32.

Mars, R. B. et al. Trial-by-trial fluctuations in the event-related electroencephalogram reflect dynamic changes in the degree of surprise. J. Neurosci. 28, 12539–12545 (2008).

33. 33.

Preuschoff, K., Quartz, S. R. & Bossaerts, P. Human insula activation reflects risk prediction errors as well as risk. J. Neurosci. 28, 2745–2752 (2008).

34. 34.

Loewenstein, G. Anticipation and the valuation of delayed consumption. The Econ. J. 97, 666–684 (1987).

35. 35.

Loewenstein, G. F. & Prelec, D. Preferences for sequences of outcomes. Psychol. Rev. 100, 91 (1993).

36. 36.

Caplin, A. & Leahy, J. Psychological expected utility theory and anticipatory feelings. Q. J. Econ. 116, 55–79 (2001).

37. 37.

Brainard, D. H. & Vision, S. The psychophysics toolbox. Spatial Vis. 10, 433–436 (1997).

38. 38.

Delorme, A. & Makeig, S. Eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis. J. Neurosci. Methods 134, 9–21 (2004).

39. 39.

Bennett, D., Murawski, C. & Bode, S. Single-trial event-related potential correlates of belief updating. eNeuro 2, ENEURO–0076 (2015).

40. 40.

Bode, S., Bennett, D., Stahl, J. & Murawski, C. Distributed patterns of event-related potentials predict subsequent ratings of abstract stimulus attributes. PloS One 9, e109070 (2014).

41. 41.

Lopez-Calderon, J. & Luck, S. J. Erplab: an open-source toolbox for the analysis of event-related potentials. Front. Hum. Neurosci. 8 (2014).

42. 42.

Sambrook, T. & Goslin, J. A neural reward prediction error revealed by a meta-analysis of erps using great grand averages. Psychol. Bull. 141, 213–235 (2015).

43. 43.

Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D. & Leahy, R. M. Brainstorm: a user-friendly application for meg/eeg analysis. Comput. Intell. Neurosci. 2011, 8 (2011).

44. 44.

Pascual-Marqui, R. D. et al. Standardized low-resolution brain electromagnetic tomography (sloreta): technical details. Methods Find Exp Clin Pharmacol 24, 5–12 (2002).

45. 45.

Gramfort, A., Papadopoulo, T., Olivi, E. & Clerc, M. Openmeeg: opensource software for quasistatic bioelectromagnetics. Biomed. Eng. Online 9, 45 (2010).

46. 46.

Destrieux, C., Fischl, B., Dale, A. & Halgren, E. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. Neuroimage 53, 1–15 (2010).

47. 47.

Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 843, 379–423 and 623–656 (1948).

## Acknowledgements

The authors wish to thank Hayley Warren and William Turner for help with data acquisition and feedback on the manuscript. This work was supported by a Faculty of Business and Economics (University of Melbourne) Strategic Initiatives Grant 2011 to S.B. and C.M. and an Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) to S.B. (DE140100350).

## Author information

### Author notes

1. Maja Brydevall and Daniel Bennett contributed equally to this work.

### Affiliations

1. #### The University of Melbourne, School of Psychological Sciences, Parkville, 3010, Australia

• Maja Brydevall
• , Daniel Bennett
•  & Stefan Bode
2. #### The University of Melbourne, Department of Finance, Parkville, 3010, Australia

• Maja Brydevall
• , Daniel Bennett
•  & Carsten Murawski

### Contributions

D.B., C.M. and S.B. designed the task. M.B. and D.B. collected the data. M.B. and D.B. analysed the data. M.B., D.B., C.M. and S.B. wrote the paper.

### Competing Interests

The authors declare no competing interests.

### Corresponding author

Correspondence to Daniel Bennett.