Agency rescues competition for credit assignment among predictive cues from adverse learning conditions

A fundamental assumption of learning theories is that the credit assigned to predictive cues is not simply determined by their probability of reinforcement, but by their ability to compete with other cues present during learning. This assumption has guided behavioral and neural science research for decades, and tremendous empirical and theoretical advances have been made identifying the mechanisms of cue competition. However, when learning conditions are not optimal (e.g., when training is massed), cue competition is attenuated. This failure of the learning system exposes the individual’s vulnerability to form spurious associations in the real world. Here, we uncover that cue competition in rats can be rescued when conditions are suboptimal provided that the individual has agency over the learning experience. Our findings reveal a new effect of agency over learning on credit assignment among predictive cues, and open new avenues of investigation into the underlying mechanisms.


Results
Agency rescues the blocking effect from the deleterious effects of massed training. We first set out to test whether agency over learning rescues the blocking effect 7 from the deleterious effects of massed training in rats (Fig. 1B). This and the remainder of the studies employed a within-subject design embedded within a between-subject, master-yoked procedure (n = 8; Fig. 1A). Master rats (Agency group) were allowed to self-initiate their trials by performing a nose poke into a nose port at any point during a period of trial availability (max = 20 s) signaled by a nose-port light 28 . On any given trial, a nose poke would turn on one or more 10-s cues of the visual and auditory modality. In contrast, yoked rats (Passive group) received an identical sequence of events to their master counterparts (including the trial availability cue), but trial presentations were noncontingent on their behavior (i.e., standard Pavlovian conditioning).
In both groups, a sucrose reward delivered in a dipper cup was made available on reinforced trials immediately after the termination of the cues. Conditioned responding was measured as the number of anticipatory head entries made by the rat at the dipper recess during the last 5 s of cue presentation 32 ("Materials and Methods"). Critically, the ITI was programmed to be only 10 s on average (range 5-15 s). Since Agency rats could forgo trial Figure 1. Agency over learning rescues competitive credit assignment from the deleterious effects of massed training in a blocking task. (A) Trial structure in this and the remainder of the studies. On each trial offer, a nose-port light was presented in both groups signaling trial availability to Agency rats. A trial-initiating response (nose poke) by an Agency rat immediately resulted in a 10-s cue being presented to that rat as well as to its yoked animal in the Passive group. On reinforced trials, a sucrose US was presented at dipper magazine following cue offset. Trial offers were separated by a 10-s variable intertrial interval (ITI). (B) Experimental design. Letters A and B denote visual stimuli, whereas X and Y denote auditory stimuli. Digits in brackets represent the probability of reward for each trial type. The pretraining phase involved a simple discrimination between A and B. During the compound phase, these trials were interleaved with compounds AX (where A should block X) and BY (where B should not block Y), both continuously reinforced. To test for blocking (i.e., less responding to X than Y), two daily probe trials with X and Y were introduced on session 9 of the Compound phase. (C,D) Behavioral results in groups Agency and Passive, respectively. The left and center line plots depict performance during the Pretraining and Compound phases, respectively. The bar graphs on the right show average responding to X and Y on probe trials across the last four sessions. Conditioned responding is measured as mean number of head entries (± SEM). www.nature.com/scientificreports/ offers, the mean ITI was effectively longer ("Materials and Methods"), but still considerably shorter than the mean ITI typically used in studies of conditioned magazine approach featuring 10-s cues (in the order of minutes). Both groups underwent two phases of training. In the first phase, rats were pretrained with a simple discrimination involving two visual cues, A and B. Cue A, which would serve as the blocking stimulus in the following phase, was reinforced with a probability of 1 [henceforward symbolized by A(1)], while B was never reinforced [B(0)]. Training continued for 14 days to allow the opportunity for asymptotic discrimination learning (Fig. 1, Panels C and D, left). A group x session block x cue mixed ANOVA revealed a significant main effect of cue (F (1,182) = 398.24, p < 0.001) and group (F (1,14) = 8.10, p = 0.013), and a group by cue interaction (F (1,182) = 15.65, p < 0.001). Bonferroni-corrected post-hoc analyses revealed that this interaction was likely driven by the slightly higher level of responding on A(1) trials in the Passive group (t (17.1) = − 3.95, p = 0.006), as both groups significantly discriminated between A(1) and B(0) (Agency: t (182) = 11.31, p < 0.001; Passive: t (182) = 16.91, p < 0.001).
In the second, compound phase ( Fig. 1, Panels C and D, center), training with A(1), B(0) continued for another 20 sessions, but, in addition, two novel auditory cues, X and Y, were presented in compound with A and B on separate, reinforced trials. Specifically, X accompanied A as the stimulus to be blocked, whereas Y accompanied B as the control cue for blocking [AX(1), BY (1)]. Panels C and D (center) of Fig. 1 show that in both groups the compounds evoked similar levels of conditioned responding as A. A group x session block x cue mixed ANOVA revealed a main effect of cue (F (3,266) = 187.35, p < 0.001), session block (F (4,266) = 4.28, p = 0.0002) and group (F (1,14) = 7.07, p < 0.019), and a group by cue interaction (F (3,266) = 10.33, p < 0.001). Bonferroni-corrected post hoc analyses revealed that both groups successfully discriminated between reinforced and nonreinforced trial types In order to monitor the emergence of blocking (i.e., less responding to X than Y), two probe trials with each of X and Y were randomly interleaved daily from session 9 onward (Fig. 1, panels C and D, center). Inspection of the results suggests that a blocking effect emerged at the end of the compound phase in the Agency, but not the Passive group. This impression was confirmed by a further group x session x cue mixed ANOVA that focused on the mean responding to X and Y across the last four probe sessions ( Fig. 1, panels C and D, right). This analysis revealed significant main effects of cue (F (1, 98) ) = 7.80, p = 0.006) and group (F (1,14) ) = 6.42, p = 0.024), and a significant group by cue interaction (F (1,98) = 6.28, p = 0.014). Exploration of this interaction with Bonferroni-corrected simple main effects confirmed a significant difference in responding to the cues in the Agency (t (98) = 3.75, p < 0.002), but not the Passive group (t (98) = 0.20, p = ~ 1). The results thus provide evidence that agency over learning rescues competitive credit assignment to cues from the adverse effects of massed trials.
Agency rescues competitive credit assignment in a novel cue competition task. To further examine the influence of agency on competitive credit assignment under massed trials, we next compared the performance of Agency and Passive groups in a novel cue competition design. This design creates a conflict between the expected pattern of responding to two cues, X and Y, when credit assignment is competitive relative to when it is noncompetitive. By creating this conflict, this design maximizes the chances of detecting differences between competitive and noncompetitive learning. This makes this design ideally suited for examining the full impact of behavioral and neural manipulations on credit assignment. Given the novelty of the design, we piloted it out in a standard Pavlovian magazine-approach setting with spaced out trials (Supplementary Materials, Exp. S1).
The details of the experimental design are shown in the table of Fig. 2A. Two groups were trained with the same master-yoked procedure used in the previous study (Fig. 1A). In the pretraining phase ( Fig. 2, panels B and C, left), rats received 10 sessions of discrimination training with two visual cues, A(1) and B(0), and two auditory cues, X(0.75) and Y(0.25), where, once again, the numbers in parenthesis represent the probability of reward associated with each cue. A group x session block x cue mixed ANOVA revealed a main effect of cue (F (3,266) = 24.27, p < 0.001) and a cue by session block interaction (F (12,266) = 2.03, p = 0.022). Bonferroni-corrected post-hoc analysis of this interaction revealed that the discrimination between A(1) and B(0) was solved from session block 3 onward (t (266) = [3.78-4.64], p < 0.01), and that between X(0.75) and Y(0.25) from session block 4 onward (t (266) = [2.94-3.06], p < 0.04). No effect of group nor any interaction involving that factor was found.
In the second, compound phase, cues X and Y had the same probability of reward as in the previous phase, but were subject to opposing competing forces (Fig. 2, Panels B and C, right). Specifically, X was presented in compound with A on the 75% of trials in which it was followed by reward [3AX(1), X(0); where 3 indicates the proportion of trials]. This allowed A to compete with X as a predictor of reward and steal its credit 33 . A's ability to serve as a competitor was further bolstered by continuing to present it by itself followed by reward [A (1)]. In addition, cue Y was presented in compound with B on the 25% of trials in which Y was not reinforced [3BY(0), Y(1)], allowing B to compete with Y for predicting reward omission. Casually put, this training was intended to ensure that B rather than Y would take the blame for the omission of reward on 3BY(0) trials 34 . Throughout this phase, B(0) trials continued to be presented.
A key advantage of this design is that X(0) and Y(1) trials permit online monitoring of the impact of competition on responding to these cues. If credit assignment is noncompetitive 35 , X should be expected to evoke more responding than Y given its higher probability of reward. Conversely, to the extent credit assignment is competitive, Y should be expected to evoke more responding than X 33 . To examine the role of agency over learning, we focused our analysis on responding on X(0) and Y(1) trials. Inspection of Fig. 2  www.nature.com/scientificreports/ cue by session block (F (9, 266) = 2.19, p < 0.001), and group by cue by session block interactions (F (9, 266) = 2.16, p = 0.025). A Bonferroni corrected post-hoc analysis of the three-way interaction revealed that, consistent with competitive credit assignment, rats in the Agency group responded to Y significantly more than to X on session blocks 9 (t (266) = 3.69, p < 0.002) and 10 (t (266) = 3.78, p < 0.002). In contrast, in the Passive group, the difference between X and Y was marginally significant only on session block 9, but in the opposite, noncompetitive direction (X > Y) (t (266) = 1.97, p = 0.05).
A group x session block x cue mixed ANOVA on responding to the remainder of the cues in the compound phase revealed a significant main effect of cue (F (3,546) = 138.82, p < 0.001) and a group by cue interaction (F (3,546) = 3.23, p = 0.022). A Bonferroni corrected post-hoc analysis, however, confirmed that that both groups discriminated between A(1) and B(0) [Agency: t (546) = 10.81, p < 0.001; Passive: t (546) = 9.26, p < 0.001] as well as between 3AX(1) and 3BY(0) trials [Agency: t (546) = 8.79, p < 0.001; Passive: t (546) = 11.42, p < 0.001]. A likely contributor to this interaction was the greater responding observed on 3AX(1) than A(1) trials in the Passive (t (546) = − 3.86, p = 0.004), but not the Agency (t (546) = 0.47, p = 1) group. This difference can be explained by the fact that learning about X was not blocked in the Passive group, allowing this cue to contribute, along with A, to conditioned responding on AX compound trials (i.e., summation).
This study thus provides further evidence of a profound impairment in cue competition under massed Pavlovian training, rendering the rescuing effects of agency over learning the more striking. One interpretation, however, is that Agency rats did not apportion credit any more competitively than Passive rats, but rather treated cues X and Y as radically distinct when presented alone vs. in compound. In other words, Agency, but not Passive rats may have treated the compounds AX and BY as separate, configural stimuli distinct from their constituent elements. To rule out this interpretation, we used a variant of the above design that does not afford such an explicit configural solution.
Agency rescues competitive credit assignment in the absence of an explicit configural solution. Figure 3A shows the experimental design. In the pretraining phase, all rats received 8 sessions with a simple visual discrimination of the form A(1), B(0) (Fig. 3, panels B and C, left). A group x session block x cue mixed ANOVA revealed that both groups solved this discrimination as evidenced by greater magazine-approach responding on A(1) than B(0) trials over the course of training (main effect of cue: F (1,98) = 88.72, p < 0.001; cue by  (1) and BY(0) compounds as of the rest of trial types. During the Pretraining phase, all rats received discrimination training between A and B and X and Y. In the Compound phase, these stimuli continued to be presented with the same probability of reward. However, X was presented in compound with A on the 75% of trials in which it was rewarded (allowing A to take away its credit), whereas Y was presented in compound with B on the 75% of trials in which it was not rewarded (allowing B to take the blame for reward omission). Trials with X(0) and Y(1) permitted continual monitoring of the predictive status of these stimuli as the discrimination developed. (B,C) Behavioral results in groups Agency and Passive, respectively. The left and right line plots depict performance during the Pretraining and Compound phases, respectively. Conditioned responding is represented as mean number of magazine head entries (± SEM). In the second, compound phase, all rats continued to receive the A(1), B(0) discrimination, but novel compound trials AX(0.75) and BY(0.25) were introduced, where X and Y were again auditory stimuli ( Note that, in the absence of competitive credit assignment, this training should result in X evoking more responding than Y, given their respective probabilities of reward (0.75 and 0.25). On the other hand, if credit assignment is competitive, more responding to Y than to X should be observed. This is because X signals a decrement in the probability of reward if considered against the backdrop of A, which is otherwise always reinforced, whereas Y signals an increment in the probability of reward if considered against the backdrop of B, which is otherwise never reinforced. Thus, considered in the context of other cues present, Y is a better predictor of reward than X.
To test this prediction, we randomly interspersed two daily nonreinforced probe trials with X and Y [X(0), Y(0)] starting on session 13 (Fig. 3, panels B and C, right). A group x session block x cue mixed ANOVA on responding during the probe trials revealed a marginally significant effect of cue (F (1,126) = 3.86, p < 0.052) and a significant group by cue interaction (F (1,126) = 5.43, p < 0.021). A Bonferroni-corrected simple main analysis of the interaction confirmed that the Agency group responded significantly more to Y than X (t (126) = 3.04, p = 0.006). In contrast, the Passive group responded equally to both cues (t (126) = − 0.26, p = 1), as expected if cue The pretraining phase involved a simple discrimination between A(1) and B(0). During the compound phase, these trials continued to be presented, but were interleaved with AX(.75) and BY(.25) trials. Note that X has a higher probability of reward than Y. However, X signals a net decrement in the probability of reward when considered against the backdrop of A, whereas Y signals a net increase in the probability of reward when considered against the backdrop of B. Thus, to the extent credit assignment is competitive, Y should evoke more responding than X, but the opposite should be true if credit assignment is noncompetitive. To test this, two daily probe trials with X and Y were inter leaved with training trials, starting on session 13 of the Compound phase. (B,C) Behavioral results in groups Agency and Passive, respectively. The left and right line plots depict performance during the Pretraining and Compound phases, respectively. Conditioned responding is represented as mean number of magazine head entries (± SEM). Ruling out alternatives to the role of agency in competitive credit assignment. The results so far can be readily interpreted by assuming that agency over learning simply enhances the animals' attention to task, general discrimination competence, or ability to process compounded stimuli concurrently. To test these interpretations, we compared performance between Agency and Passive rats in a patterning task in which opposite credit must be assigned to compound cues and their constituent elements (Fig. 4A). One such problem was a negative-patterning discrimination where two cues, a visual (A) and an auditory (X) stimulus, were rewarded when presented individually, but not in compound [A(1), X(1), AX(0)]. A second problem involved a positivepatterning discrimination in which another pair of visual (B) and auditory (Y) stimuli were rewarded when presented in compound, but not individually [B(0), Y(0), BY (1)]. If any of the aforementioned interpretations is correct, Passive animals should find this discrimination particularly difficult. Although both discriminations were trained concurrently, for simplicity's sake we treated them separately when displaying and analyzing the data (Fig. 4, Panels B  Taken together, these findings suggest that the deficits in competitive credit assignment previously observed in Passive rats were unlikely due to an inferior level of engagement, ability to solve complex discriminations, process compounded stimuli concurrently, or form configural representations. To buttress these conclusions, we also presented this patterning problem to the rats from the first study at the end of blocking training (Supplementary Materials, Exp. S2). Since those rats had already experienced the two auditory stimuli as X and Y, two novel auditory stimuli were used. The results of this replication confirmed www.nature.com/scientificreports/ those with naïve animals. That is, the same Passive rats that exhibited deficits in blocking were if anything better at solving complex nonlinear discriminations than their Agency counterparts. This finding is consistent with prior evidence that cue competition between the elements of a compound can hinder the solution of nonlinear discriminations 36 .

Discussion
Competitive credit assignment among environmental cues is the backbone of associative and reinforcement learning models of Pavlovian conditioning, to the point that an inability to account for cue-competition phenomena renders a model obsolete 37 . Yet converging evidence indicates that competition is not automatically determined by the presence of other cues, but also by learning conditions such as trial spacing [20][21][22] ). Specifically, when information is presented in massed fashion, cue competition is diminished [24][25][26][27] . Here, we provide evidence that granting rats agency over trial presentation can rescue competitive credit assignment from the detrimental effects of massed training. The beneficial effects of agency-often referred to as free (vs. forced) or self-determined (vs. imposed) choice-on performance has been documented in domains such as education 38 and creativity 39 . In the human cognitive literature, agency over what task 40,41 or what feature of a task to engage with 42 has been shown to enhance performance, and the neural bases of this phenomenon are receiving increasing attention [42][43][44][45] . Our data adds to this literature by showing that the way predictive credit is negotiated among environmental cues is dramatically altered by whether the individual has control over the occurrence of those cues, even when there is no knowledge of the specific cue being presented.
Taken together, our results allow us to rule out various trivial explanations. Firstly, the beneficial effects of agency on competitive credit assignment are not simply the product of a heightened ability to process compounded stimuli concurrently or learn complex discriminations. Evidence for this comes from the superior performance of Passive groups in the patterning task. Secondly, neither can the contribution of background (contextual) cues to competitive learning explain our full pattern of results. While contextual conditioning could summate with responding to the target cues 46 and mask any differences when responding is at ceiling (e.g., Passive group in our blocking study), such masking would not occur when responding is below asymptote (e.g., in our novel cue-competition task studies). Thirdly, our data are likewise difficult to explain by a differential role of eligibility traces in Agency and Passive groups 2 . Specifically, in Passive animals, massed training might allow eligibility traces of recently presented cues to spill over the subsequent trial and contaminate credit assignment. This effect would be weaker in Agency rats if trial-initiating responses serve to precipitate the decay of eligibility traces, essentially fulfilling the role of a long ITI. The issue with this hypothesis is that it also predicts poorer performance for Passive rats in the patterning task, which our data disconfirmed.
Our findings speak to the necessity of incorporating agency into theories of associative and reinforcement learning. This raises the question of what mechanisms might be responsible for the effects observed. One possibility is that agency alters the computation of prediction errors (PE). Recently, various authors have posited that self-determined choices induce a positivity bias in PEs 47 , either because positive PEs are amplified 44 or because negative PEs are discounted 42 . Such an imbalance would give excitatory learning (i.e., cue-reward) the upper hand over inhibitory learning (cue-no reward). To explore this possibility, we conducted a series of simulations of our experimental designs using standard associative theory 5 , inspired by Chambon et al. 47 . We assumed that positive PEs are weighted significantly more during learning than negative PEs under Agency, but not Passive conditions ( Figure S3, Supplementary Materials). Notably, the general prediction of a greater asymptote of responding in Agency animals was not confirmed by our data, although this may be due to ceiling effects in our dependent measure or differential learning-to-performance functions in the groups. More importantly, our simulations (Fig. S3) provided a proof of concept that a PE positivity bias can explain the basic effects we report. In our first blocking study (Fig. 1), for instance, a positivity bias predicts faster conditioning of the blocking cue A during the Pretraining phase in Agency than Passive rats. Assuming that learning about A is preasymptotic at the end of this phase (asymptotic learning would produce complete blocking in both groups), this cue will be in a better position to block X in the Compound phase in the Agency group. In our third study (Fig. 3), a positivity bias would readily accommodate the greater level of responding on BY(0.25) trials during the Compound phase in Agency relative to Passive rats, which, combined with the ongoing extinction of B on B(0) trials, would ensure that Y accrues more credit in the former group. Assuming that A blocks X from acquiring substantial credit on reinforced AX(0.75) trials in both groups, this would explain why Agency, but not Passive rats, responded more to Y than X on probe trials. A similar argument could be applied to the results of our second study (Fig. 2). Therein, Y(1) trials, which were relatively infrequent in each session, may have enjoyed faster excitatory learning in Agency rats, which combined with the inability of X to acquire significant credit (as it is being blocked by A), would yield greater responding to Y than X in those animals. Finally, our simulations also showed that, in accordance with our findings, a positivity bias will disrupt the acquisition of a negative patterning discrimination in the Agency condition, although our data does not show a greater response summation on AX(0) trials as anticipated by the model. Overall, however, the positivity bias hypothesis 44,47 provides a parsimonious account of our results-and one that need not assume that the mechanisms directly responsible for cue competition (e.g., the computation of aggregate PEs 5,6 ) operate any differently in the presence or absence of agency.
Admittedly, agency over learning might regulate other mechanisms besides the relative impact of PEs on learning; for instance, by modulating the allocation of attention to cues. According to selective attention accounts 1,48,49 , cue competition results from paying increasing attention to the most accurate predictors of an outcome while simultaneously learning to ignore poor predictors. Conceivably, agency over learning might facilitate this attentional divergence in the face of cognitively challenging massed training conditions. However, www.nature.com/scientificreports/ the beneficial effects of agency on credit assignment need not be limited to learning and attention, but could also work at the level of memory retrieval, in line with comparator mechanisms 50 . The current findings have important implications both for normal functioning and mental health. In pedagogical settings, where massed instruction has long been known to be detrimental [51][52][53] , our data suggest the possibility that agency over the presentation of information might promote more competitive, and therefore selective, learning. In the context of mental health, our findings open up opportunities for therapeutical interventions based on enhancing the individual's perceived sense of agency in disorders characterized by attenuated cue competition, including psychotic 54,55 , attentional 56 , anxiety 57 , and substance use disorders 58 .
For the present, much work is needed to elucidate the complex role that agency is likely to play in learning and psychopathology. Consider the case of substance use disorders, where drug self-administration is known to mitigate some of the more dramatic and aversive effects of drugs of abuse 59,60 . In addition to allowing a more accurate prediction of drug receipt, self-administration might also foster cue competition and thereby preclude irrelevant stimuli from contributing to cue reactivity in the future. We speculate that as the sense of agency over drug consumption wanes and drug-related behaviors transition from voluntary and goal-directed to habitual and compulsive 61 , credit assignment might also become less competitive. This transition would exacerbate the individual's vulnerability to drug abuse and relapse by drastically expanding the set of stimuli capable of inducing cue reactivity. Consistent with this, evidence suggests that long-term exposure to potent rewards such as cocaine, heroin, and sucrose undermine competitive credit assignment 62,63 . In light of such implications, the present findings call for a closer investigation of the role of agency in credit assignment among predictive cues.

Materials and methods
For the sake of convenience, the four studies above will be referred to in this section as Exps. 1-4, and correspond, respectively, to the blocking task (Fig. 1), the novel cue-competition task (Fig. 2), its second variant (Fig. 3), and the patterning task (Fig. 4).

Experimental animals.
All studies used 16 experimentally-naïve, gender-balanced, adult Long-Evans rats, making a total of 64 animals. The age and weights of the rats at the outset of each experiment was as follows. In Exp. 1, rats were ~ 20 weeks old (wo) and weighed 441-516 g (males) and 257-290 g (females); in Exp. 2, rats were ~ 13 wo and weighed 342-388 g (males) and 234-269 g (females); in Exp. 3, rats were ~ 20 wo and weighed 448-529 g (males) and 269-298 g (females); in Exp. 4, rats were ~ 22 wo and weighed 475-554 g (males) and 284-325 g (females). All animals were bred at Brooklyn College from commercially available populations (Charles River Laboratories). They were housed individually in standard clear-plastic tubs (10.5 in. × 19 in. × 8 in, Charles River Laboratories) with woodchip bedding. The colony room was maintained on a 14:10 light/dark cycle schedule. Behavioral sessions were conducted between 7-10 h after the onset of the light phase of the cycle. Throughout training, water access was restricted to 1 h/day following each experimental session while food was provided ad libitum. All animal care and experimental procedures were carried out in compliance with the ARRIVE guidelines 64  Apparatus. Behavioral training was conducted in eight modular conditioning chambers (32-cm long X 25-cm wide X 33-cm tall, Med Associates, Inc.). Each chamber was enclosed in a ventilated sound-attenuating cubicle (74 cm × 45 cm × 60 cm) fitted with an exhaust fan that provided a background noise level of ∼50 dB. All reported locations of stimulus and response apparatus were measured from the grid floor of the conditioning chamber to the lowest point or edge of the apparatus. The left wall of the chamber housed two white jewel lamps (28 V DC, 100 mA) mounted on the left and right panels 9.3 cm from the grid floor. Above each of these lamps was a speaker located 20.6 cm above the grid floor and connected to a dedicated tone generator capable of delivering a 2.5-Hz, 80-dB clicker (left panel) and a 70-dB white noise (right panel). Two additional speakers were located on the left and right panels of the right wall of the chamber 24.8 cm above the grid floor. Each of them was also connected to a dedicated speaker capable of delivering a 12-kHz, 70-dB tone (left panel) and a 1-kHz, 80-dB tone (right panel). The right wall also housed a third jewel lamp located on the center panel 17.2 cm above the grid floor. Below this lamp, 4.6 cm above the grid floor, was a circular nose port 2.6 cm in diameter, equipped with a yellow LED light and an infrared sensor for detecting nose entries. This nose port was flanked by a recessed liquid reward magazine (aperture: 5.1 cm × 15.2 cm) located on the right panel, 1.6 cm above the grid floor. This magazine was equipped with an infrared sensor for detecting head entries, and connected to a liquid dipper that could deliver a 0.04 cc droplet of a 10% sucrose solution. The chambers remained dark throughout the experimental session except during presentations of the visual stimuli. In the same room was a computer running Med PC IV software (Med Associates Inc., St. Albans, VT, USA) on Windows OS which controlled and automatically recorded all experimental events via a Fader Control Interface.
Procedure. Magazine training. Prior to the beginning of each study, rats were first randomly assigned to either the master or yoked group-labeled Agency and Passive, respectively-with the constraint that each group be gender-balanced. Each animal assigned to the Agency group was paired with an age and sex-matched Passive group animal. All sessions began with a 2-min acclimation period in the conditioning chambers. Rats initially received a session of magazine training in which they learned to retrieve a sucrose reward from the dipper cup. This session lasted 62 min and consisted of 60 trials. For the first 10 trials, sucrose was made available for 30 s every 30 s; for the second 20 trials, it was available for 20 s every 40 s; and for the last 30 trials, it was available for 10 s every 50 s. www.nature.com/scientificreports/ Shaping. In all four studies, Agency rats went on to receive five shaping sessions in which they learned to selfinitiate trials, following the procedure developed by Reverte et al. 28 . On the first shaping session, the nose-port light was turned on for a maximum of 20 s, during which a nose poke at the nose port immediately resulted in the termination of the nose-port light and a 10-s period of sucrose availability. Trials were separated by a 10 s variable ITI (range: 5−15 s). Failure to respond at the nose port resulted in the nose-port light coming off and the trial being repeated after a regular ITI. Over the following four shaping sessions, we introduced and progressively increased a delay of 2, 4, 6, and 8 s between the rat's response at the port and sucrose availability. During this delay, the nose-port light would flash at a 1-Hz frequency (on for 0.5 s, off for 0.5 s). Concurrently, reward availability was progressively shortened (8, 6, 4, and 3 s). Throughout shaping training (and for the remainder of the experiment), Passive were yoked to their Agency counterparts to ensure that they received the same exact sequence of events and at the same time, except, of course, for the trial-initiation response.
Trial structure. The trial structure was common to all four studies (Panel A, Fig. 1). Following shaping, experimental sessions began with a 30-s acclimation period. Agency rats would then be presented with their first opportunity to start a trial as signaled by trial-availability cue (the onset of the nose-port light). The duration of this cue was 20 s, during which a response at the nose port would immediately turn off the nose-port light and turn on one of various possible visual, auditory, or audiovisual compound cues which were always 10 s long. The trial types specified by these cues were selected from a pseudorandom list built with the constraint that no trial type could be presented more than three times in succession. On reinforced trials, the cues' offset coincided with the delivery of a 0.04-cc bolus of sucrose, which remained available for 3 s, after which a short ITI followed (mean: 10 s; range 5-15 s). As during shaping, failure to self-initiate a trial terminated the nose-port light after 20 s and led to a regular ITI period (mean: 10 s; range 5-15 s). Passive rats received the same sequence of events-including the same trial types at the same time and in the same order-as their Agency counterparts, but in standard Pavlovian fashion (i.e., noncontingent on any response). For any yoked pair of rats, a session terminated once the Agency rat completed all scheduled trials or timed out after 90 min. Since Agency rats self-paced their training, they could take breaks that elongated the effective average ITI. Such pauses were of course also applied to their yoked counterparts. Supplementary Table S1 provides the effective ITI durations as well as the total session durations for each experiment. Furthermore, granting agency over trial presentation necessarily entailed the risk that rats would not complete all scheduled trials within the imposed time limit of 90 min, in which case the session would time out. Once again, the use of a master-yoked procedure ensured that this issue affected both groups equally. Supplementary Table S2 Fig. 1). In the first, pretraining phase, rats in both groups received 14 sessions of A(1) vs. B(0) discrimination training, where A and B were visual cues and the numbers in parenthesis represent the probability of reward. One visual cue was constructed by flashing the two jewel lamps on the left wall alternately at a 2-Hz frequency (on for 0.25 s, off for 0.25 s), whereas the other was provided by the steady illumination of the white jewel lamp located on the right wall. These cues were counterbalanced, and were presented 48 times each in a session.
The second, compound phase comprised 20 sessions, during which rats continued to receive A(1), B(0) trials presented 36 times each per session. In addition, compound trials AX(1) and BY(1) trials were introduced, where X and Y represent two auditory cues. These auditory cues were provided by a 12-kHz, 70-dB tone and a 70-dB white noise, counterbalanced. There were 12 presentations of each of the AX and BY compounds per session. From session 9 to the end of the compound phase, two probe trials with each of the target cues, X (i.e., the cue to be blocked) and Y (the control cue) were additionally administered. This increased the total number of trials in Phase 2 from 96 to 100. Experiment 2. Training consisted of two phases (see table in Panel A, Fig. 2). In the first, pretraining phase, rats received 10 sessions of A(1) vs. B(0) and X(0.75) vs. Y(0.25) discrimination training, where A and B were the same visual cues and X and Y were the same auditory cues used in Exp. 1, also counterbalanced. Once again, the numbers in parenthesis indicate the probability of reward. Each cue was presented 24 times in a session.
In the second, compound phase, the probability of reward for each cue trained in Phase 1 was maintained constant, but cue A was added to all trials in which X was reinforced, whereas B was added to all trials in which Y was not reinforced. Thus, the compound phase consisted of the following trial types: 10A(1), 10B(0), 30AX(1), 10X(0), 30BY(0), 10Y(1), where the coefficients represent the number of trials presented in a session (100 trials in total). Phase-2 training proceeded for 20 sessions. Experiment 3. The study comprised two phases (see table in Panel A, Fig. 3). In the pretraining phase, rats received 8 sessions of A(1) vs. B(0) discrimination training, where A and B were the same visual cues used in Exp. 1. Each of these trial types was presented 48 times in a session.
In the second, compound phase, rats received 32 sessions of discrimination training. During this phase, A(1) vs. B(0) training continued, but audiovisual compounds AX(0.75) and BY(0.25) were added, with X and Y being the same auditory stimuli used in Exp. 1. Specifically, the compound phase consisted of the following training trials: 24A(1), 24B(0), 18AX(1), 6AX(0), 6BY(1), 18BY(0), where the coefficients denote the number of trials presented in a session. Starting on session 13, two probe trials with cues X and Y were interleaved with training trials on every session, raising the total number of trials per session from 96 to 100. www.nature.com/scientificreports/ Experiment 4. Rats received a single phase of training consisting of 42 sessions with two concurrently trained nonlinear discrimination problems. One of these problems was an A(1), X(1), AX(0) negative-patterning discrimination, whereas the other was a B(0), Y(0), BY(1) positive-patterning discrimination. Cues A and B were the same visual cues, and X and Y were the same auditory cues used in the previous experiments, counterbalanced within modality. All trial types were presented 16 times in a session, making a total of 96 trials.
Behavioral measures. Conditioned responding was measured in both groups as the number of head entries in the sucrose magazine during the last 5-s of the 10-s cues. Focusing the analysis on the latter half of the cue has two advantages. First, it provides a cleaner measure of goal-tracking behavior, as sign-tracking behavior-which we did not measure, and which may have differed between the groups-tends to concentrate in the first half of a 10-s cue 32 . Second, it filters out any bias in conditioned behavior resulting from the fact that Agency and Passive rats began their trials at different locations in the chamber relative to the sucrose magazine. Indeed, whereas Agency rats necessarily had their snouts in the adjacent nose port at the time of cue onset, Passive rats were free to roam in the chamber and approach the magazine at all times, compromising any between-group comparison at the start of the cue period.

Statistical analysis.
To prepare for analysis, we averaged the number of magazine entries in each session, producing a single response value for each subject, to each cue, for that session. Where session blocks are reported, we then averaged these values over the appropriate number of sessions. These data were then analyzed using a Group x Session (or Session-block) x Cue mixed ANOVAs, with the exception of Experiment S1, which used a Cue by Session-block repeated measures ANOVA. All analyses were conducted using the GAMLj package for Jamovi 66,67 , which employs the Satterthwaite method for calculating degrees of freedom (https:// gamlj. github. io/). The regression intercept of each subject was treated as a random effect to control for subject differences. Significant interactions were explored using either Bonferroni-corrected simple effects analyses or post hoc tests, both reported as t-values. Statistical tables are provided in the Supplementary Materials.