Fig. 3 | Nature Communications

Fig. 3

From: Deviation from the matching law reflects an optimal strategy involving learning over multiple timescales

Fig. 3

Model analysis: the bias-variance trade-off and the optimal choice behavior under uncertainty. ac Analytical model results for two different volatility conditions, with a block size of 100 trials (more volatile) or 10,000 trials (less volatile). All the solid (dashed) lines refer to the results for a less (more) volatile task with a block size of 100 (10,000 trials). a Bias (blue) and variance (orange) of model’s inference about reward rates show a tradeoff as a function of the relative weight of the slow integrator (wSlow). The bias is squared. b The squared error of model’s inference about reward rates, which is the sum of the squared bias (blue in a) and the variance (orange in a), is plotted against wSlow for different volatility conditions. The optimal relative weight wSlow for the volatile environment (solid vertical line) is smaller than that for the stable environment (dotted vertical line), since more volatile environments (solid curve) require faster integrators, or a smaller relative weight of the slow integrator. c Deviation from the matching law (shown as the slope of block-wise choice fraction vs reward fraction) covaries with the relative weight of the slow integrator, and also with the volatility condition. d, e Model simulation results on the same experimental schedule experienced by monkeys. d Model simulations show a clear tradeoff between undermatching (a form of behavioral bias) and the variance of choice probability, as a function of the relative weight of slower learning wSlow. The square root of variance is shown for illustration. e Model simulations also show changes in harvesting efficiency as a function of the relative weight of the slow integrator wSlow. As a result of the bias-variance trade-off, the curve takes an inverted U-shape, with a maximum at the optimal relative weight, determined by the volatility of the experiment. For panel d, we computed the variance of Monkey’s choice probability as follows. First, monkey’s choice time series was smoothed via two half-gaussian kernels with standard deviations of σ = 8 trials and σ = 50 trials, with a span of 200 trials. This gave us two time series: a fast one with σ = 8 trials and a slow one with σ = 50 trials. We defined the variance as the variance of the fast one over the slow one. The squared root of variance is shown in the panel. For panels d, e models with different wSlow’s were simulated on the experimental schedules experienced by Monkey F. We set τFast = 2, τSlow = 1000 trials. Note that our results do not rely on the precise choice of τFast and τSlow (see also Fig. 1)

Back to article page