a, Schematic of the shaping protocol. Training consisted of nine mazes with increasing task difficulty. In the first five mazes, cues were permanent and were visible from the beginning of the trial (but still became progressively bigger as the mouse approached them). From maze 6 onward, cues only appeared when the mouse approached within 10 cm of their location. From maze 7 onward, cues could also appear on the unrewarded side. Cues were randomly distributed along the cue region. The number of cues on each side was sampled from a Poisson distribution with the mean indicated for each maze. b, Task performance, model fit and relative contributions of the behavioural variables throughout learning. The total number of neurons, the number of neurons with good model fit during the cue period (R2 > 5%; these were used to calculate the relative contributions of the behavioural variables during the cue period), and the number of mice analysed in each training stage are indicated at the top. Shaded colours are s.e.m. The results show that task performance increased steadily across the permanent cue mazes, and then dropped in the first transient cue maze, probably owing to the working memory component that is added in the transient cue mazes. The overall R2 value of the behavioural model increased across learning, indicating that over the course of training, neural activity could be better explained by the measured behavioural variables. Notably, the relative contribution of position increased monotonically during the permanent cue mazes, but then dropped during the transient cue mazes, similar to the performance of the mice across the mazes. This is consistent with the interpretation of positional ramps as reflecting a value signal3,18, because the expected value at each position is closely related to reward expectation for that session, and reward expectation is determined by average task performance. The relative contributions for cues also increased during early learning, consistent with being a reflection of the strength of the cue-reward association. Note that this value is decreased in the last maze, in which (because of the increased task difficulty) each cue has a lower predictive power with respect to reward. The relative contribution of previous reward decreased across the permanent cue mazes, then transiently increased during the first transient cue session. Because relying on previous reward is the wrong strategy in this task, this decrease in the relative contribution of previous reward may relate to mice weighting previous reward more heavily during the major steps in training when they have not yet learned the correct strategy for solving the task. The relative contribution of kinematics declined over the training procedure. This may be due to the kinematic aspect of the behaviour becoming less variable over training, as the mouse’s motor skills improved for virtual-reality navigation. The relative contribution of trial accuracy was significantly higher during the transient cue mazes than the permanent cue mazes. This result potentially suggests that DA activity is correlated with task performance preferentially when there is a working memory component. The reward response declined during the permanent cue mazes, and remained relatively consistent during the transient cue mazes; this is consistent with an RPE signal, as RPE indicates negative modulation of reward responses by reward expectation (and reward expectation is related to task performance). c, Proportion of neurons that were significantly modulated by the different behavioural variables throughout learning (see Methods). Shaded colours show the 1 s.d. confidence intervals for a binomial distribution calculated using Jeffreys method. d, Details of the shaping procedure. The table lists the parameters of the mazes progressively used during the shaping of the behaviour. The ‘permanent cues’ field indicates whether the cues were presented at the beginning of the trial; otherwise, each cue was presented when the mouse was 10 cm away from its location. ‘High- (and low) -cue-probability side mean’ indicates the means of the Poisson distribution from which the number of cues presented on each side were drawn (at least one cue was always drawn); ‘none’ indicates that no cues were presented for the low-probability side on any trial in that maze. The mice were automatically advanced to the next maze if the following criteria were met: (1) their performance was above a predetermined threshold (‘minimum performance for advancing’ field) for a given number of trials (‘number of trials to calculate performance’ field). (2) They completed at least n sessions in the current maze, in which n is given by the ‘minimum number of sessions for advancing’ field.