People and organizations alike use rewards, from snacks to salary bonuses and frequent-flyer miles, to shape behaviour through a process called reinforcement learning. For example, if a dog receives a treat for rolling over in response to a verbal command, the likelihood of that behavioural response to the verbal cue will increase. Writing in Neuron, Sendhilnathan and colleagues1 describe neuronal signals that could support such reward-driven learning. What is remarkable is where the authors found these signals — not in the brain areas that have long been implicated in reinforcement learning, but in the cerebellum, a brain structure historically associated with error-driven, rather than reward-driven, learning.
The cerebellum is best known for its role in motor-skill learning — the process by which movements become smooth and accurate through practice. Fifty years of research2 supports the idea that when you practise a movement, such as your tennis backhand, the cerebellum uses feedback about errors to gradually refine the accuracy of the movement by weakening the neuronal connections that are responsible for those errors. It has been widely assumed that the cerebellum uses a similar, error-correcting learning algorithm to support cognition3, because the regions of the cerebellum that contribute to cognitive functions such as navigation4 and social behaviour5 have the same basic circuit architecture as those that control movement.
In the past three years, however, there has been a flurry of studies showing reward-related neuronal activity in the cerebellum6–12. What are reward signals doing in an error-correcting part of the brain? Sendhilnathan et al. leveraged the rapid learning abilities of monkeys to gain fresh insights into reward-related signalling in the cerebellum.
In each experimental session, the authors presented a monkey with two visual cues it had never seen before on a computer screen. One arbitrarily assigned cue would result in the monkey receiving a reward of fruit juice if the animal responded by lifting its left hand. The other cue would result in a reward if the monkey lifted its right hand. The researchers monitored the activity of neurons called Purkinje cells in the cerebellum as the monkeys learnt, through trial and error, to make the correct response to each visual cue (Fig. 1).
Sendhilnathan et al. found that the activity of cerebellar Purkinje cells carried information about the success or failure of the monkey’s most recent attempt at the task. One subpopulation showed high activity following a correct response to the cue; another showed high activity following a failed attempt. These signals arose a few hundred milliseconds after the end of a trial and persisted until the next trial was completed. As such, they seemed to provide a working memory that could enable the outcome of one trial to guide the next behavioural choice.
These signals are reminiscent of those carried by neurons in frontal and parietal regions of the brain’s cerebral cortex, which encode the ‘value’ of different behavioural choices on the basis of reward history over multiple trials13. In the current study, the cerebellar neurons kept track of only the most recent trial’s outcome. But in this task, the outcome of a single trial provides sufficient information for the monkey to infer the correct response for the next trial — if a reward was not given when a monkey lifted its right hand in response to one visual cue, for instance, then the correct response to that cue must be to lift the left hand, and the correct response to the other visual cue would be to lift the right hand. It would be interesting to know whether cerebellar neurons can keep track of a more-extended history of rewards should the task require it, and whether the cerebellum interacts with the cerebral cortex in performing this computation.
Importantly, information about the previous trial’s outcome was present in the cerebellum only when a new set of cue–response associations was being learnt. As monkeys improved their performance over trials, the neuronal activity encoding each outcome waned. Moreover, the signal was not present when monkeys earned rewards by responding to a pair of visual cues that they had mastered through several months of training. These observations indicate that cerebellar neurons are not simply carrying information about rewards, predictions about rewards or the movements that animals make when anticipating rewards. Rather, the cerebellum seems to contribute specifically to learning about how to earn rewards in a new situation. The authors speculate that the cerebellum might enhance the rate of learning about rewards, a possibility supported by the recent discovery in rodents of direct, excitatory projections from the cerebellum to neurons in the brain stem that release the reward-associated neurochemical dopamine14.
There are several intriguing parallels between the signals found by Sendhilnathan and colleagues and the signals involved in cerebellar control of movement. First, as with reward-driven learning, for some motor skills, cerebellar Purkinje cells contribute selectively to new motor learning and not to performing older motor skills15,16. Second, Purkinje-cell activity carries information that could guide both ongoing behaviour and the induction of learning during motor- and reward-based learning17. Third, the Purkinje cells carry signals that could support working memory in the form of activity maintained from one trial to the next in reward-based learning, and in the form of activity maintained during a delay period between a cue and the motor response to the cue, which seems to support motor planning11,18. Finally, during both types of learning, individual Purkinje cells are active for a specific time period of a few hundred milliseconds, with information seemingly passed from cell to cell over time19. These striking parallels raise the possibility that the cerebellum performs a similar function for error-driven motor learning and reward-driven reinforcement learning.
We learn from both our successes and our failures. These two learning schemes were previously attributed to distinct brain structures, but the current results, along with those of others6–12, blur these mechanistic and conceptual boundaries. As such, the work highlights the need to consider how long-range interactions between brain areas support the shaping of behaviour by experience.
Nature 579, 202-203 (2020)