Insights about the common generative rule underlying an information foraging task can be facilitated via collective search

Social learning is beneficial for efficient information search in unfamiliar environments (“within-task” learning). In the real world, however, possible search spaces are often so large that decision makers are incapable of covering all options, even if they pool their information collectively. One strategy to handle such overload is developing generalizable knowledge that extends to multiple related environments (“across-task” learning). However, it is unknown whether and how social information may facilitate such across-task learning. Here, we investigated participants’ social learning processes across multiple laboratory foraging sessions in spatially correlated reward landscapes that were generated according to a common rule. The results showed that paired participants were able to improve efficiency in information search across sessions more than solo participants. Computational analysis of participants’ choice-behaviors revealed that such improvement across sessions was related to better understanding of the common generative rule. Rule understanding was correlated within a pair, suggesting that social interaction is a key to the improvement of across-task learning.


Statistical analysis
All statistical analyses were implemented using a set of packages in Python (v3.7.6),

Probability of finding the optimal option
We define the "box" with the highest reward in each environment as the optimal option. As seen in Supplementary Fig. S2, the mean proportion of sessions (out of 6) during which participants found the optimal option was slightly higher in the pair condition than in the solo condition (solo: 0.43 / pair: 0.48). We analyzed the proportions of optimal option discovery using a binomial GLM with a fixed effect of condition as a dummy variable (solo: 0 / pair: 1). The model was fit using MCMC, and the 95% Bayesian credible interval ([-0.06, 0.52]) crossed 0, indicating that the difference in proportion of optimal option discovery was not significant between the two conditions.

Exploration pattern in terms of "migration" length
We examined the mean migration length from trial t-1 to that at trial t as another measure of participants' exploration behaviors ( Supplementary Fig. S4). We measured migration length using Manhattan (i.e., "city block") distance. As seen in the figure, for both the solo and pair conditions, participants initially migrated long distances ("exploration") but tended to gradually settle in and capitalize on a local patch ("exploitation"). The median of the migration length in the pair condition (=3.08) was smaller than that in the solo condition (=3.27). However, there was no significant difference between the two conditions (t(117)=-1.67, p = 0.1, d=-0.32, CI: -0.94 to 0.14; Supplementary Fig. S4, left). Also, even when decomposing the mean migration length data into the first (trials 1 to 8), second (trials 9 to 16), and third (trials 17 to 25) phases, we found no significant difference between conditions in any phase (

Subjective estimates about unchosen options
As a probe for participants' understanding of the environmental structure, we asked each participant to provide subjective estimates for the (unknown) rewards from options that she/he did not choose in each session. At the end of each session, participants reported their estimates for 16 options, which were randomly selected from the set of unchosen options.
Half of these 16 options had higher rewards (equal to or higher than 0.5 in the min-max normalization scale) and the other half had lower rewards ( Supplementary Fig. S11). As a measure of the accuracy of subjective estimates, we calculated the mean absolute deviation of

Gaussian process regression
A Gaussian process is defined as: where real-valued scalar output = ( ( ! ), ( " ), … , ( # )) follows a Gaussian distribution with mean ( ) and covariance matrix , the elements of which are denoted by As described in Methods, we used a Radial Basis Function (RBF) kernel '() ( , * ) to express the covariance: where the length-scale parameter (> 0) determines how quickly correlations between options and * decay towards zero: The smaller the , the more rapidly correlations between options decay with the increase in distance between the options.

Alternative models
In Supplementary Fig. S6, we show six variations of models in addition to the UCB+S model. All alternative models are nested in terms of the sampling policy 4 Table S2 for mean AICs of these models.

Parameter recovery
We checked the performance of parameter recovery for the UCB+S model by generating 100 artificial datasets with random values for the four parameters. As seen in Supplementary Fig. S9, for all four parameters, correlations between individual true parameters and recovered parameters were high and positive ( : r=0.98, : r=0.86, : r=0.93, : r=0.90). We also confirmed that the recovery process yielded no correlations between the four recovered parameters (the correlations between the recovered parameters were all slight and negative; see Supplementary Fig. S10).

Numeric simulation of the effect of imitation bias on performance
We explored how task performance may change as a function of the magnitude of the imitation bias ( ). In this simulation, we assumed that the agents made choices according to the UCB+S model in the same task setup as in the experiment. We ran a total of 1,000 We observed that the effect of imitation bias follows an inverted U-shape pattern.
Recall that most participants in the experiment had a small to moderate imitation bias (median =0.18, Fig. 3c). This magnitude of imitation bias seems to have contributed to the superior performance of the pair condition, as compared to the solo condition ( Fig. 2a; see also Supplementary Fig. S7).      process. Thus, participants are assumed to exploit their own information during learning, but not actively explore (i.e., they 'purely exploit' the already-known information). Pure-Explore+S: In the process of forming the valuation function of each option, the expected rewards ( ) of options have no effect on the learning process. Thus, participants are assumed to explore the not-yet-observed options during learning, but not actively exploit their own information (i.e., they 'purely explore' the not-yet-observed information). UCB, Pure-Exploit, and Pure-Explore are the sub-models of the UCB+S model, the Pure-Exploit+S model, and the Pure-Explore+S models, respectively, removing the social information. That is, these models assume that, in the process of forming the valuation function of each option, the social information 123+$43 has no effect on the learning process. Thus, participants behave as if they are learning individually, even though the partner's information is available.