The best way to respond flexibly to changes in the environment is to anticipate them. Such anticipation often benefits us if we can infer that a change has occurred, before we have actually experienced the effects of that change. Here we test for neural correlates of this process by recording single-unit activity in the orbitofrontal cortex in rats performing a choice task in which the available rewards changed across blocks of trials. Consistent with the proposal that orbitofrontal cortex signals inferred information, firing changes at the start of each new block as if predicting the not-yet-experienced reward. This change occurs whether the new reward is different in number of drops, requiring signalling of a new value, or in flavour, requiring signalling of a new sensory feature. These results show that orbitofrontal neurons provide a behaviourally relevant signal that reflects inferences about both value-relevant and value-neutral information about impending outcomes.
The best way to respond flexibly to changes in the environment is to anticipate them. In many cases, such anticipation requires us to infer that a change has occurred, before we have actually experienced the effects of that change. For example, suppose you see your boss striding into his office in the morning with a dark scowl on his face. You realize that he is in a bad mood, and consequently you can change your expectations for what he might say to you in your morning meeting. Note that because you are using inference, the sensory cue—the scowl—need not ever be directly associated with the changed value of his words. But you, having observed the hidden sign and made the inference, can anticipate what will happen next.
The orbitofrontal cortex (OFC) has been hypothesized to track such inferred states and to signal information about expected outcomes based on them1,2,3. Indeed, numerous studies have found that neural activity in OFC anticipates expected outcomes4,5,6,7,8,9,10,11,12. However, the question of what function this activity performs has not been fully answered. One idea is that anticipatory activity drives decision making by representing the expected or even economic value of the outcomes that are likely to ensue in a given situation6,13. However, most OFC neurons track specific features of outcomes, like flavour, or even their context, rather than value, and respond more generally to other events in the environment6,11,14,15,16,17. Furthermore, causal studies have reported that OFC manipulations disrupt value-guided decisions when they are based on inferences and not when they are based on simple comparisons of previously learned values13,18,19,20,21,22,23.
Here we sought to test for neural correlates of this inferential process by recording single-unit activity in OFC in rats performing a choice task in which, on each trial, the rats chose between two milk rewards, which varied across blocks of trials both in number of drops and flavour. We were specifically interested in how OFC neurons would change firing in anticipation of new rewards at block switches, before those rewards were actually experienced. Consistent with the proposal that OFC infers information about expected outcomes, firing in OFC changed at the start of each new block as if predicting the not-yet-experienced reward. This change predicted the speed of with which choice behaviour was adjusted in that block, and it occurred whether the new reward was different in number of drops, requiring signalling of a new value, or in flavour, requiring signalling of a new sensory feature. These results show that OFC neurons infer both value-relevant and value-neutral information about impending outcomes.
Rats (n=6) were trained in an odour-guided choice task (illustrated in Fig. 1a) to respond at either a left or right well to receive a small (one drop) or a large (three drops) amount of chocolate or vanilla-flavoured milk. This task was similar to an odour-guided choice task that we have used previously14, except that we manipulated reward flavour in some blocks instead of reward delay. Importantly, we used amounts and concentrations that we established separately were equivalently preferred (see Fig. 1b). Response-reward contingencies were stable across blocks of ~60 trials, but switched unpredictably in reward number or reward flavour at block transitions. Contingencies were arranged so that (i) rewards in the two directions always differed in both number and flavour (for example, large chocolate versus small vanilla or large vanilla versus small chocolate), and (ii) number and flavour switches alternated according to the sequence number-flavour-number-flavour across the five blocks of every session. Free-choice and forced-choice trials, instructed by an odour delivered at the beginning of the trial, were intermixed within blocks but always had the same response-reward contingencies.
The purpose of this task was to compare neural activity after switches in the number of reward drops, in which the value of choices changed, with that after switches in reward flavour, in which there was no change in the value of the two choices. In accord with this distinction, as shown in Fig. 1c, number switches resulted in a fast-developing and sustained change in choice rate on free-choice trials, so that after ~38 trials rats were choosing the side with three drops of milk on ~80% of free-choice trials. The rate with which they made this switch was independent of flavour, and flavour transitions (chocolate to vanilla or vanilla to chocolate) caused no enduring change in choice rate (see figure captions for all statistics).
We also examined how reaction time and accuracy on forced-choice trials were influenced by shifts in reward number and flavour. As shown in Fig. 1c,d, reaction time, defined as the time from odour offset to odour port exit, was faster and forced-choice performance was more accurate when a large reward could be expected compared with when a small reward could be expected. Again these effects were independent of reward flavour. Thus, rats’ performance in this task on both free- and forced-choice trials was sensitive to the number of reward drops, thereby reflecting the higher value of a large reward, but insensitive in to their flavour, reflecting the similar value placed on the chocolate and vanilla flavours.
We recorded 831 single-units from OFC in six rats during these 94 behavioural sessions, which included all recording sessions in which all five block were completed. The six rats completed 3, 24, 25, 28, 5 and 9 sessions, in which we recorded 28, 170, 275, 246, 28 and 84 single-units, respectively (some rats had fewer than ten completed sessions because of broken electrodes or other technical problems). The locations of the recordings are shown in Fig. 2a. We were interested in reward-anticipatory activity; such activity has been reported frequently in OFC. Thus, we analysed activity in a 500 ms epoch as rats waited in the fluid port, immediately before the first drop of reward was delivered. We performed an analysis of variance (ANOVA) on firing rate for each unit during this epoch with factors reward number, reward flavour and reward location (left or right). We used forced-choice trials for this analysis, because they were equally distributed across the levels of these factors. Many neurons were selective for number or flavour during this epoch (Fig. 2b–f). Counting main effects of number along with number × flavour and number × location interactions, 278 neurons (33%) were number anticipatory and 198 (24%) were flavour anticipatory, including 124 (15%) with effects of both number and flavour or interactions between them. In addition, as has been reported previously, many neurons fired selectively in anticipation of rewards according to their location (384, or 46%, showed some effect of reward location), and the majority of neurons with effects of number or flavour also had some effect of location (173 of the 278 neurons with number effects, or 62%, also had location effects; 123 of 198 neurons with flavour effects, or 62%, also had location effects).
To determine whether neurons in these anticipatory populations encoded reward value or simply the features of available rewards, we calculated a number and flavour index for each neuron. This index was simply the difference in average peak-normalized firing rate during the reward anticipatory epoch, between the large and small or the chocolate and vanilla reward trials, respectively. As shown in Fig. 2c, both number- and flavour-selective populations were equally distributed between their two poles, indicating that OFC neurons were equally likely to fire selectively in anticipation of large versus small rewards or chocolate versus vanilla rewards (large, 126 neurons; small, 152 neurons; chocolate, 103 neurons; vanilla, 95 neurons). In addition, the average magnitude of the number index did not differ between large- and small-preferring populations, nor did the magnitude of the flavour index differ between chocolate- and vanilla-preferring populations. Thus, OFC seemed to contain similar representations of each variable reward feature, independent of their value relevance as indexed by behaviour.
We next asked how these neurons changed their firing rates across number and flavour transitions. Because we were particularly interested in whether they would signal information about expected outcomes before direct experience with them in a new block, we took advantage of one salient feature of our task, which was that a change in reward on one side could be used to predict the features of the new reward on the other side. For example, the sudden receipt of a large chocolate reward on the left meant that a small vanilla reward would now be available on the right. Thus, when a rat received a new reward on one side at the beginning of a new block, it could immediately infer the features—both size and flavour—of the new reward available on the other side. To test whether this inference was evident in the firing rate of outcome-anticipatory OFC neurons, we identified which direction the rats had first received reward after each block switch, and then examined neural activity on the first response in the other direction—that is, the first trial in the off-direction. For simplicity, we will call this trial the ‘inference trial’. We eliminated all cases in which a free-choice trial occurred before the inference trial on the same side, because that free-choice trial would have had the same reward as one on the inference trial. Across all block switches, an average of 2.1±0.1 rewarded trials occurred in the first direction before the critical inference trial occurred in the off-direction.
We examined activity in all neurons that were number- or flavour-anticipatory by an ANOVA, using only trials in the second half of blocks (so as not to pre-select the population for early-block selectivity, which we were examining here). We tested activity across block switches in which a preferred reward feature for a particular neuron occurred in the block before or after the switch, comparing the firing rate on the inference trial with that at the end of the previous block (see Methods). If the population signalled the inference, then in the latter case (preferred feature after switch) firing rate might be expected to increase on the inference trial, whereas in the former case (preferred feature before switch) firing rate might be expected to decrease on the inference trial. As illustrated in Fig. 3 for single-unit examples and Fig. 4 for the population, we found exactly this pattern of firing for both number- and flavour-selective neurons. For example, the unit in 3A was selective late in blocks for the small reward (1 drop) at the left well. When the reward on the left changed from three drops to one drop, the neuron fired phasically in anticipation of the new small reward the first time one drop was delivered on the left (that is, before it was directly experienced). The only basis for making this prediction would be knowledge of the associative structure of the task in combination with the memory of having received a new three-drop reward at the right well on the past few trials. These two pieces of information could be used by the rat to infer that it would receive a one-drop reward the next time it responded at the left well. Importantly, such changes in firing were only observed on the inference trial; if we instead analysed activity on the final trial of the previous block (that is, one trial before the inference trial), there was no significant change in activity (Fig. 5). Finally, we performed a second control analysis, shown in Fig. 6, in which number-selective neurons were tested across flavour block switches, and vice versa; neither of these conditions showed an effect of block switch. A direct comparison of the two control conditions with the inference condition also showed that the observed pattern of changes in activity was specific to the inference trial (stats for this comparison are in the legend for Fig. 6).
Interestingly, the inference signal observed in OFC neurons was not tightly locked to the rats’ behaviour. For number switches, OFC neurons seemed to infer the outcome before the rats’ behaviour clearly reflected this knowledge, at least according to our standard behavioural metrics. Rats took about 12 rewarded trials (6 in each direction) after a block switch to reach a 50% choice rate (that is, choosing both sides equally). In contrast, the inference signal was detectable before the first time a new reward was delivered in the off-direction. In addition, for flavour switches, the inference signal seemed to be completely dissociated from behaviour, because rats did not change their behaviour in any obvious way in response to flavour changes. This suggests that the neural inference signal was not a consequence of behaviour, but it also calls into question whether the signal has any behavioural relevance.
To examine this issue, we looked at choice rate on blocks in which we observed strong inference signals versus those in which we observed weak inference signals, by calculating an inference score for each number-selective neuron across each included block switch. This score was simply the difference between the normalized firing rate on the inference trial of the new block minus that in the same direction at the end of the previous block (see Methods). As shown in Fig. 7a, when the number of drops changed, a large inference score was associated with a rapid switch in choice behaviour. This difference was most pronounced in the first five trials of the block, but was significant until about trial ten. Accordingly, across the first ten trials of the block, we found a significant positive correlation between the neural inference score of the single-units and the rats’ choice rates (R=0.28, P<0.01, Pearson correlation). This correlation was specific for activity on the inference trial. As shown in Fig. 7b, when we did a parallel analysis using the last trial of the previous block in place of the inference trial, there was no correlation with choice behaviour in the new block (R=−0.031, P=0.75, Pearson correlation).
The relationship between the inference signal and the rats’ choice performance can also be shown if we divide sessions by whether the rats switched quickly to the large reward at the start of the block (choice rate in first ten trials above the median) or only more slowly (choice rate in first ten trials below the median). As illustrated in Fig. 8, this analysis shows that neural inference was only observed at the start of fast-switching blocks and not at all at the start of slow-switching blocks. Again, this difference was specific for the inference trial: when we compared activity either on the last trial of the previous block or at the end of the new block, fast-switching and slow-switching blocks did not differ in OFC reward selectivity.
Given the relationship between the number inference signal and behaviour, we next wondered whether flavour inferences had any relationship with behaviour. Although flavour changes had no enduring effect on choice rate, a small downward deflection in choice rate appeared to occur immediately after flavour switches (Fig. 1c). We tested whether this change was significant by examining choice rate in the first two free-choice trials after flavour switches. Indeed, there was a small but significant decrease in the choice rate towards the large reward (or, equivalently, an increase in choice rate towards small reward) in those first two free-choice trials, as compared with the last two free-choice trials of the previous block. This difference disappeared in the next two free-choice trials of the new block (last two choice trials of previous block: 15±2.0% small choice; first two of new block: 22±2.5%; next two of new block: 17±2.3%; a within-subjects ANOVA on choice rate, with factor trial-pair found a main effect of trial pair: F2,374=3.2, P<0.05; planned comparison between last two trials and first two trials: F1,187=6.3, P<0.05; planned comparison between last two trials and second two trials: F1,187=0.5, P=0.47).
To test whether this behavioural effect of flavour switches was related to the flavour inference neural signal, we divided flavour block switches by whether the first two free-choice trials after the switch had at least one choice of the small reward, or whether they had none (this split was in effect a median split, because the median number of small choices in the first two was zero). We re-examined neural activity in the same flavour anticipatory population analysed earlier across flavour switches, this time testing each side of the split separately. As shown in Fig. 9, in the small-choice group, inference signalling was significant on the inference trial, but did not change significantly between the inference trial and the end of the block. Conversely, in the zero-small-choice group, neurons did not show significant changes in activity on the inference trial, and all updating occurred between the inference trial and the end of the new block. Thus, like number inference signalling, strong flavour inference signalling appears to be associated with a behavioural change that may indicate immediate awareness of the switch. However, we did not see a significant interaction between group and signalling on the inference trial, indicating that this behavioural measure was not as closely tied to behaviour as the speed of choice updating was. We were also unable to test for a correlation with this behaviour, because almost all switches had either zero small choices or one small choice within these two free-choice trials.
The question of whether inference signalling reflects partial or complete updating of the number or flavour selectivity can be answered using the split data described above. For both number and flavour block switches, when rats showed behavioural evidence of being aware of the block change (that is, either fast choice switching after number switches or transient small-choices after flavour switches), no significant additional updating occurred between the inference trial and the end of the new block. Conversely, when there was less evidence that rats were aware of the block change, the inference trial itself showed no evidence of updating, and instead all significant updating happened between the inference trial and the end of the new block.
Finally, we asked whether any behavioural or neural factor might predict the strength of either the drop-number or flavour inference signal. We examined (i) the total number of trials in the previous block, (ii) the percent of block switches that were the first switch of the session, (iii) the choice rate at the end of the previous block and (iv) the total number of errors in the previous block. As detailed in Table 1, none of these factors were significantly different between block switches in which the inference signal was above the median as compared with when it was below the median. Likewise, as illustrated in the leftmost points in the top panels of Figs 8 and 9, the strength of drop-number or flavour selectivity at the end of the previous block failed to predict whether rats would switch choice rate quickly in the new block (for drop-number switches), or whether the first two free-choice trials would contain a small choice (for flavour switches).
The ability to predict likely outcomes or consequences is fundamental to normal adaptive behaviour and learning. This ability obviously benefits if changes in outcomes can be inferred based on an understanding of the rules or causal structure of the environment rather than requiring direct experience. Broadly speaking, this is the basis of the distinction between so-called habitual or model-free behaviour, in which responses are made based on pre-computed or cached values or ‘policies’, and goal-directed, outcome-guided or model-based behaviour, in which responses are based on values or policies that are arrived at on-the-fly through a process of mental simulation24,25,26,27.
Historically, the OFC has been strongly associated with emotions and associative behaviour28,29,30; however, in the past decade, accounts of this involvement have been increasingly dichotomized to reflect the division between signalling information derived solely from direct experience versus information that is inferred from access to relationships between events in the environment or task at hand. OFC is typically not necessary for the former but is always critical for the latter13,18,20,21,22,23. This has been interpreted as showing a role for the OFC in signalling an inferred or derived rather than a pre-computed value6,13.
Yet a growing number of accounts implicate the OFC not only in signalling inferred value but also in signalling value-independent information about impending events. This is evident in single unit and functional magnetic resonance imaging studies, in which neural signals in OFC represent features of impending outcomes—and in fact other predictable events—even when they are value neutral6,11,14,15,31. And in rats there exists clear evidence that the OFC is necessary for behaviour and learning that reflects these specific sensory features18,19,32. These data raise the possibility that the OFC may not only represent value, even inferred, but also instead may be a fundamental part of the circuit that represents the associative structure of the environment33.
Here we provide single-unit evidence consistent with this proposal. Specifically, we found that single units in rat OFC fired in anticipation of expected outcomes in a way that reflected both value-based and value-neutral features of the outcomes. Indeed, the strength or directionality of the correlates did not appear to be especially tied to whether or not the features had implications for value. As such, these results are consistent with a variety of prior reports that have emphasized the signalling of value-neutral associative features in the OFC11,15,16,17. Of course, information about value-neutral features could still be used to derive value or to drive goal-directed behaviour, but regardless of its use, these results suggest that OFC neurons represent a rich description of the specific features defining the outcome, including even the location or directional response required to obtain it.
Furthermore, this information was represented immediately at the start of new trial blocks, even before the predicted outcome features had been directly re-experienced. As noted earlier, such a prediction would require knowledge of the rules or associative structure of the task. Rats may use this knowledge to mentally simulate predicted outcomes (as model-based processing is usually conceived) or, alternatively, they may recognize which of the unique task ‘states’ (that is, the arrangement of outcomes in each block) they are currently in. Notably, both approaches require rats to represent the ‘state space’ or, equivalently, the associative structure of the task, in concert with more local information (recently experienced outcomes), to make predictions that in turn allow fast updating of choice behaviour. This linking together of associative information to promote flexible responding is consistent with the proposal that the OFC is part of a network involved in tracking hidden environmental states and making predictions about impending events33. This role may be one reason the OFC is important for behaviour in settings similar to the current task, which involve reversals21,30,34,35,36, however, it may also be related to OFC’s involvement in flexible behaviour that goes beyond immediate or even direct experience37,38,39,40,41.
Male Long-Evans rats were obtained at 175–200 g (~60 days old on arrival) from Charles River Labs. Rats were tested at the University of Maryland School of Medicine in accordance with School of Medicine and NIH guidelines.
Surgical procedures and histology
Surgical procedures followed guidelines for aseptic technique. Electrodes, consisting of drivable bundles of eight 25-μm diameter FeNiCr wires (Stablohm 675, California Fine Wire) electroplated with platinum to an impedance of ~300 kΩ, were manufactured and implanted as in prior recording experiments. Driveable electrodes were implanted in the left OFC (n=6; 3.0 mm anterior to bregma, 3.2 mm laterally, and, to begin, 4.0 mm ventral to the surface of the brain) in each rat. Four rats also had electrodes implanted in the ventral striatum; those data are not included in this report. At the end of the study, the final electrode position was marked, the rats were euthanized with an overdose of isoflurane and perfused, and the brains were removed from the skulls and processed using standard techniques.
Recording was conducted in aluminium chambers, on one wall of which was a panel with an odour port and two fluid wells arranged below it (see Fig. 1). The odour port was connected to an air flow dilution olfactometer to allow the rapid delivery of olfactory cues. The fluid wells were connected to fluid delivery lines containing flavoured milk (Nesquick brand chocolate or vanilla) diluted 50% with water. Delivery of odours at the odour port and the fluids at the fluid wells was controlled by a custom C++ programme interfaced with solenoid valves. Photobeam breaks at the port and wells were monitored and recorded by the programme. A houselight was also controlled by the programme.
Rats were trained extensively before implanting them with electrodes. After implantation, we retrained rats to work with the recording cable. Each training session included as many trials as a rat would perform before quitting, approximately 150–250. This initial shaping phase gradually introduced all elements of the task (described below), and thus rats could learn the associative structure of the task over this period. Recording was begun when rats could complete five blocks of trials (at least 260 trials) with the cable. Total number of pre-recording training sessions averaged 32.5 (ranging from 24 to 43).
Each recording session consisted of a series of self-paced trials organized into five blocks. Rats could initiate a trial by poking into the odour port while the house light was illuminated. Beginning 500 ms after the odour poke, an odour would be delivered for 500 ms. If the rat withdrew from the odour port before completion of the 1,000 ms pre-odour+odour period, the trial would be aborted and the houselight turned off. At the end of the odour, rats could respond by moving from the odour port to the left fluid well or the right fluid well, after which they had to wait for 500 ms before fluid delivery began; if they exited the well during this period, no fluid was delivered and the trial ended. The identity of the odour specified whether they could receive reward at the left well (forced-choice left), the right well (forced-choice right) or either well (free-choice). The identity and meaning of these odours remained the same across the entire experiment. Odours were presented in a pseudorandom sequence such that the free-choice odour was presented on 7/20 trials and the left/right odours were presented in equal numbers (±1 over 250 trials). In addition, the same odour could be presented on no more than three consecutive trials.
Rewards were either one bolus or three boli of chocolate or vanilla milk, with bolus size ~0.05 ml and 500 ms between boli. Response-reward contingencies were consistent within blocks of trials, such that the same reward would be delivered for every correct right response, either free or forced choice, and a different reward would be delivered for every correct left response, free or forced choice. The reward schedule was arranged so that in each block, reward features available on one side were always paired with the opposite reward features on the other side—thus when one drop of chocolate milk was available on the left, three drops of vanilla was available on the right, and so on, resulting in a total of four different reward combinations. On the first block, consisting of on average 43 (standard deviation 16) trials that were used to set the rats’ expectations before the first block switch, one of these combinations was randomly chosen. The subsequent four block transitions then followed, in order: (i) a drop-number transition, in which the side with one drop changed to three drops and vice versa, but the side-flavour contingencies remained the same; (ii) a flavour transition, in which the side with chocolate changed to vanilla and vice versa, but the side-number contingencies remained the same, (iii) another drop-number transition, (iv) another flavour transition. These block transitions were not explicitly signalled. The length of the last four blocks varied non-systematically around 65, with a standard deviation of 10.7 across the experiment.
During testing, rats were limited to 10 min of ad lib water each day, in addition to fluid earned in the task.
Flavour preference testing
In six rats from a separate experiment (same strain and source, and same water restriction regimen), we compared consumption of the chocolate versus vanilla milk solution in two-bottle tests. All rats were tested for total 10 min, with the location of the bottles swapped every 30 s. Two rats were given five 2-min tests, whereas the other four rats were given one 10-min test each.
Procedures were the same as described previously42. Wires were screened for activity daily; if no activity was detected, the rat was removed and the electrode assembly was advanced 40 or 80 μm. Otherwise a session was conducted, and the electrode was advanced by at least 40 μm at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Plexon Inc.), interfaced with odour discrimination training chambers. Signals from the electrode wires were amplified and filtered by standard procedures described in the previous studies. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded with event timestamps sent by the behavioural programme. Waveforms were not inverted before data analysis.
Units were sorted using Offline Sorter software from Plexon Inc., using a template-matching algorithm. Sorted files were then processed in Neuroexplorer to extract unit timestamps and relevant event markers. These data were subsequently analysed in Matlab.
To analyse reward anticipatory activity, we examined firing rate in the 500 ms epoch between fluid well entry and the first bolus of reward delivery. We performed an ANOVA (P<0.05) on each neuron’s firing rate during this epoch, with baseline (from houselight on to odour initiation) subtracted, using factors number of reward drops (one or three), reward flavour (chocolate or vanilla) and well location (left or right). In the initial ANOVA, we included all correct forced-choice trials across the last four blocks (across which all factors were completely crossed and balanced). Every neuron with a number or flavour effect, regardless of whether it showed an inhibitory or excitatory response, was included.
To measure the degree of number- and flavour-selectivity, we calculated a number and flavour index for each neuron. To do so, we peak-normalized each neuron by dividing all firing rates by a normalization factor, which was the peak firing rate across the trial in 500 ms bins after averaging across the first ten and last ten of each of the eight number-flavour-well conditions in the last four blocks. We then calculated the number and flavour indices as:
number index=average peak-normalized firing rate on all three-drop trials−average peak-normalized firing rate on all one-drop trials
flavour index=average peak-normalized firing rate on all chocolate trials−average peak-normalized firing rate on all vanilla trials.
For neurons with significant interaction effects, the above formulas would not reflect the actual degree of their selectivity, so for those neurons, we calculated separate indices for each level of the interacting factor, and then took the index with the highest absolute value. For instance, for a neuron with a flavour × direction interaction, we calculated separate flavour indices for the left and right wells, and then assigned whichever of these was greater in magnitude as the flavour index for that neuron.
To analyse neural activity reflecting inferences, we identified the well at which reward was first delivered after each block switch, and then identified the first forced-choice trial that was rewarded at the other well in that block, eliminating cases in which a free-choice trial was rewarded at that well before the first forced-choice trial. For simplicity, we will call this trial the inference trial. The inference trial was therefore the first trial of the block in which reward was delivered at that well. We compared reward-anticipatory activity on the inference trial with activity in forced-choice trials in the same direction at the end of the previous block. For the reported analysis, this previous block activity was the average of the last three trials, not including the last one, from the previous block, but the results remained qualitatively the same independent of exactly how many trials from the previous block we used; we eliminated the last trial so that we could examine activity on that trial as a control. For drop-number-selective neurons, we noticed that they tended to ramp up firing before the rat entered the fluid well to wait for reward; thus we changed the epoch to begin 100 ms before fluid well entry, still ending upon delivery of the first drop of reward.
To define the population to be tested for inference signalling, we first redid the ANOVA, this time only including forced-choice trials after the first 20 in each block (the last half of blocks). We did this in order to eliminate statistical bias towards finding inference, which would occur at the beginning of blocks, but results were similar using the original ANOVA. We tested for inference signalling across all neurons with a significant effect of number, flavour or interactions of these factors. Each neuron’s preferred reward feature was defined for main effect neurons as the feature it fired most for across the trials that went into the ANOVA. For instance, for a neuron with a main effect of number, it was defined as three-drop preferring if it fired most across trials (after the first 20 of blocks) in which the three-drop reward was delivered. For interaction effects, we did follow-up ANOVAs at each level of the interacting factor to determine whether that neuron had a significant effect at each level, and, if so, which reward feature it preferred at that level. For instance, for a neuron with a significant flavour × direction effect, we tested for a flavour effect separately at the left well and the right well, and, for each significant effect, defined whether that neuron preferred chocolate or vanilla at that well. Then, for inference testing, we only included a neuron when the block transition in the analysed direction consisted of a switch between that neurons preferred to anti-preferred reward feature, or between its anti-preferred to preferred reward feature. We also did two control analyses to test whether observed patterns of changes were specific to inference: (i) for each neuron, we replaced the inference trial with final trial of the previous block, but leaving all other factors the same, and (ii) for each neuron, we examined the change across the opposite block change from that in which the inferential change was measured (for example, for a flavour inference measured on the first flavour block change, we examined its change across the first number block change).
For correlations with behaviour, we calculated an inference score for each neuron across each qualifying block switch, defined as follows:
inference score=(sign-factor)*(peak-normalized firing rate during the analysed epoch of the inference trial−average peak-normalized firing rate in the analysed epoch over the last 10 (not including the last trial) of the previous block in the same direction.)
sign-factor=1, for anti-preferred → preferred switches
sign-factor=−1, for preferred → anti-preferred switches
We used ten trials from the previous block in equation 3 in order to reduce trial-to-trial variability in the score as much as possible, but, again, the exact number of trials from the previous block did not change the results qualitatively. As a control, we tested a parallel correlation in which we substituted for the inference trial the last forced-choice trial from the previous block. We also calculated selectivity at the end of the new block, in parallel to the inference score, as follows:
end-of-block selectivity=(sign-factor)*(average peak-normalized firing rate in the last 10 (not including the last trial) of the new block in the same direction as the inference trial−average peak-normalized firing rate over the last 10 (not including the last trial) of the previous block in the same direction).
When data for correlations contained two block changes for the same neuron, those data points were averaged so that each data point in the correlation came from a unique neuron.
Statistics were done using Matlab, Excel and Statistica. Planned comparisons were used for testing specific effects of multi-way ANOVAs. For displays of neural activity, bin-size was 100 ms except when the inference trial was shown for single-units, in which case bin-size was 250 ms. Neural activity in the displays was smoothed with a boxcar algorithm, with a 9-bin boxcar used for single-unit plots outside of the analysed epoch (3-bin boxcar for the inference trial on single-units), and either a 3-bin boxcar (for Fig. 2) or no smoothing (Fig. 3) being used within the analysed epoch. For population average activity displays (Figs 4, 5 and 7), a 5-bin boxcar was used. To make plots of trial-by-trial choice rate, we first aligned all rewarded trials for all blocks, and for each trial (1st, 2nd, 3rd and so on after the block switch), we took the proportion of choices towards the side with the three-drop reward, excluding all blocks in which a forced-choice trial happened to occur on that trial. For Fig. 1, line was then smoothed using a 3-bin boxcar, separately for before and after the switch.
The opinions expressed in this article are the authors' own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government.
This work was supported by funding from NIDA.