Fig. 2 | Nature Communications

Fig. 2

From: Deviation from the matching law reflects an optimal strategy involving learning over multiple timescales

Fig. 2

Model analysis: how slow reward integration leads to undermatching. a Scheme of a model. The model integrates reward history over two timescales (τFast, τSlow) to estimate the expected income for each choice (red or green). These incomes are then combined to generate stochastic decisions, where PG (PR = 1 − PG) is the probability of choosing the green (red) target. While previous models18,25 focused on the integration timescales that are shorter than the block size, here we assume that the long timescale τSlow is much longer than the block size, while the short timescale τFast is still shorter than the block size. The model integrates the incomes estimated over the two timescales with adjustable relative weights: wFast for the relative weight of the fast integrator (τFast) and wSlow for the weight of the slow one (wFast + wSlow = 1). b, c Fast integration \(\left( {w_{{\mathrm{Fast}}} \gg w_{{\mathrm{Slow}}}} \right)\). If the weight of the fast integrator is much larger than the slow one, the model relies only on the recent reward history estimated over an interval that is approximately τFast. As a consequence, the estimated incomes largely and rapidly fluctuate. This noisy estimation leads to large fluctuations in the choice probability PG (red). Despite the fluctuations, the mean of such choice probability follows the matching law (indicated by the solid red line in c). However, the fluctuations of PG are rather large, as indicated by the broad shaded area, which denotes the standard deviation of PG. d, e Slow integration \(\left( {w_{{\mathrm{Fast}}} \ll w_{{\mathrm{Slow}}}} \right)\). If the weight of the slow integrator is much larger than the fast one, the model now integrates rewards only on the long timescale τSlow. This eliminates the fluctuations in the choice probability; however, the choice probability of is constant at 0.5, because the estimated incomes are balanced over multiple blocks of trials. Therefore, the choice probability becomes independent from the recent reward history, causing a strong (exploratory) deviation from the matching law (e). Note the the actual choice probability is determined by the overall color reward imbalance in the task (0.5, if no bias). f, g Mixed integration: \(\left( {w_{{\mathrm{Fast}}} \simeq w_{{\mathrm{Slow}}}} \right)\). If the two integrators are similarly weighted, both deviation from the matching law (undermatching) and the amplitude of the fluctuations are intermediate. This captures experimental data, and manifests a computational tradeoff between bias (long integrator; undermatching) and variance (short integrator; fluctuations). Parameters were set to be τFast = 5 trials, τSlow = 10,000 trials, wSlow = 0.3 for f, g. Note that our results do not rely on the precise choice of τFast and τSlow

Back to article page