Episodic curiosity for avoiding asteroids: Per-trial information gain for choice outcomes drive information seeking

Humans often appear to desire information for its own sake, but it is presently unclear what drives this desire. The important role that resolving uncertainty plays in stimulating information seeking has suggested a tight coupling between the intrinsic motivation to gather information and performance gains, construed as a drive for long-term learning. Using an asteroid-avoidance game that allows us to study learning and information seeking at an experimental time-scale, we show that the incentive for information-seeking can be separated from a long-term learning outcome, with information-seeking best predicted by per-trial outcome uncertainty. Specifically, participants were more willing to take time penalties to receive feedback on trials with increasing uncertainty in the outcome of their choices. We found strong group and individual level support for a linear relationship between feedback request rate and information gain as determined by per-trial outcome uncertainty. This information better reflects filling in the gaps of the episodic record of choice outcomes than long-term skill acquisition or assessment. Our results suggest that this easy to compute quantity can drive information-seeking, potentially allowing simple organisms to intelligently gather information for a diverse episodic record of the environment without having to anticipate the impact on future performance.

. Scatterplot of random by-subject effects from the linear mixed effects model. Figure S2 displays the intercepts and slopes of the information gain -feedback request-rate relations for each of the eight blocks of the test. As seen, there is rather little change in the association across blocks (i.e., time in the test); the slopes are pretty similar. Consistently, we find no reliable effect of block on the association between information gain and feedback request (β = .00, p > .05) as summarized in table S3. to .24 (all ps <.05), and Information gain coefficient range from .17 to .21 (all ps < .001). Notes. All dfs subject to Satterthwaite corrections. *p < .05. **p < .01. ***p < .001.

Individual differences in information seeking and performance
5 Participants differed substantially in information seeking rates. To get a sense of its relationship with other variables, we compared information seeking rates with information gain and steering reaction times ( Fig S3). As seen in Fig S3, there appears to be no obvious relationship between overall feedback request rate and information gain, or between reaction time and feedback request rate and infogain, respectively. Figure S3. Summary of participants feedback request rates, expected information gains, performance and reaction times as a function of task difficulty (trajectory), respectively. All panels sorted on subject (i.e., top row refers to the same subject across all four panels). Lighter color reflects higher value on each respective quantity.
An alternative way of analyzing information seeking behavior to the episodic curiosity method described in the main paper is to test if participants selected feedback with an aim of maximizing information gain. If so, then given a certain willingness to wait, the requests should be spent on the most informative (i.e., difficult) trajectories. This allowed us to express each subject's feedback selections as a fraction of an ideal observer for the corresponding rate of feedback requests.
6 Moreover, to distinguish choice from chance level, we reset the proportion based on the range of possible information gains for each participant. The participant distribution of these normalized ideal information-gains are displayed in Figure S4. As seen in Figure S4, only 4 participants select less than 0.5 of ideal information-gain, and the remainder of the participants are clearly biased towards an ideal level of info-gain given feedback rate.

Figure S4. Participant distribution of normalized fraction of ideal info-gains.
To test how much overall variance infogain accounted for in our data, we performed a PCA on feedback requests across difficulty (trajectory) and participants. As seen in Fig S5,

Eye movements
Participants requested feedback by fixating the cockpit area on the screen with their eyes. It is possible that this procedure guided participants' eye movements differently for different trajectories. To investigate this possibility, we aggregated all eye fixation positions from all participants for each of the ten trajectories, respectively. Three of these aggregated sets are displayed in Figure S6. As seen in Fig S6, there was no substantial difference in fixation positions between the easiest and most difficult trajectories. Importantly, fixations seem not to have extrapolated the easiest trajectory (rightmost panel in Fig S6) outside of the feedback request box. Thus, there appears to be no confound of trajectory with accidentally driving fixations away from the feedback zone. In other words, extrapolating from the trajectories would bring the eyes within the feedback zone for all trajectories.
Note that there seems to be a decreasing density of fixations in the feedback zone moving from the left to rightmost panel indicating more frequently requested feedback, the more difficult the avoidance decision was.