Previous models have shown that behavioural differences in reward-based tasks can be explained to some extent by differing reinforcement learning 'meta-parameters' such as the animal's learning rate, exploration–exploitation balance (a measure of how much an animal explores the environment for options rather than using already gathered information) and future discounting (a measure of how much an animal favours receiving a reward immediately rather than waiting for it). Most of the current models assume that these meta-parameters are constant; in vivo, however, the neurobiological entities that underlie them, such as neurotransmitters and neural activity in particular brain areas, are dynamic. Luksys et al. therefore tested whether varying the values of meta-parameters in a reinforcement learning model could predict behaviour more accurately.
The authors began by recording the behaviour of mice from two genetic strains — one with an anxious and one with a calm phenotype — during a conditioning task. The animals had to learn to poke their noses through a hole in the cage wall when a light was switched on in order to retrieve a reward. During the 8 days of training, selected groups of mice were exposed to stress and/or injected with an adrenergic agonist or antagonist. Twenty-six days later the mice underwent multiple recall trials in which the effects of these manipulations on their learning and memory — measured by the number and duration of nose pokes during light-on and light-off conditions — were observed.
This is a preview of subscription content, access via your institution