NMDA receptor-dependent plasticity in the nucleus accumbens connects reward-predictive cues to approach responses

Learning associations between environmental cues and rewards is a fundamental adaptive function. Via such learning, reward-predictive cues come to activate approach to locations where reward is available. The nucleus accumbens (NAc) is essential for cued approach behavior in trained subjects, and cue-evoked excitations in NAc neurons are critical for the expression of this behavior. Excitatory synapses within the NAc undergo synaptic plasticity that presumably contributes to cued approach acquisition, but a direct link between synaptic plasticity within the NAc and the development of cue-evoked neural activity during learning has not been established. Here we show that, with repeated cue-reward pairings, cue-evoked excitations in the NAc emerge and grow in the trials prior to the detectable expression of cued approach behavior. We demonstrate that the growth of these signals requires NMDA receptor-dependent plasticity within the NAc, revealing a neural mechanism by which the NAc participates in learning of conditioned reward-seeking behaviors.

(a) Representative behavioral raster plots of one animal on the first (left, Day 1) and the last (right, Day 6) day of training. Within each panel, performance is divided into S+ and S-trials. Each trial is shown in a different row, and trials are sorted earliest to latest from bottom to top. Black horizontal lines within each trial represent periods when the rat's head was inside the reward receptacle. Data is aligned to the time of cue onset (vertical red line). Arrows mark the 10 s interval before and after cue onset. The raster plots show that, early in training, an overall high frequency of entry into the reward receptacle may preclude the interpretation of entry during the S+ as specifically cue-driven behavior. Note that fluctuations in S+ responding are accompanied by fluctuations in responding during the intertrial interval. This emphasizes the need to consider the rate of indiscriminate responding (i.e., during the ITI) when quantifying cued responding.
(b) Calculating the performance index. The left panel represents hypothetical performance on ten trials aligned to the time of S+ onset (vertical red line). Pink rectangles span the duration of the S+. Black rectangles depict entries into the receptacle. A dashed red line indicates the beginning of a window beginning 10 s prior to cue onset. For each trial, two latency values were calculated: the interval from the point 10 s prior to the cue to the first receptacle entry occurring prior to the cue (ITI pseudolatency), and the period during which the cue was on (cued latency, corresponding to the interval between cue onset and receptacle entry). If no entry was made during one of these periods, a value of 10 was assigned. To calculate the performance index of the animal on a given trial, its cued latency on that trial was subtracted from its ITI pseudolatency on the same trial. The performance index ranges from -10 to 10, with negative values indicating that the animal entered into the receptacle faster in the absence of the cue than in its presence, and positive values indicating the opposite. Values around zero suggest that the cue has no influence on receptacle entry behavior. The table on the right shows ITI pseudolatency, cued latency and performance index corresponding to the trials shown in the left panel.  (a) Hypothetical application of the change point algorithm to the cumulative record as of trial 30 (adapted from Gallistel, Fairhurst and Balsam, 2004 ). First, a straight line is drawn from trial 30 to the origin. Second, the trial that maximally deviates from that line is identified as a potential candidate change point (test change point). Third, performance values before and after the test point are compared. If the null hypothesis of no change can be rejected at a user-specified significance valuewe chose p < 0.05 (logit = 1.3) -that test change point is considered a candidate change point. The algorithm then truncates the data at that point and treats that candidate change point as the new origin. Finally, the algorithm starts the process all over again, running successively over each trial in the cumulative record.
(b) The result of this iterative algorithm is typically a list of candidate change point trials. Gallistel et al. take the first candidate change point in the cumulative record as the definitive change point -the first trial after which cued behavior can be consistently detected. However, they applied the algorithm on behavioral variables that can only adopt null or positive values, which yield cumulative records in which the change of the slope can only detect an improvement in behavior or lack of thereof (i.e., the slope can only be positive or 0). In contrast, our performance index can also capture instances in which the animal's likelihood or speed of cued responding is less than what would be expected from its baseline behavior. As a result, at the beginning of training, it is not unusual to find brief increases in the slope of the line followed by decreases. For that reason, for a candidate change point to be identified as definitive in our paradigm, the subsequent segments between candidate change points in the cumulative record had to have a positive slope, or the candidate change point was rejected. The slope of these segments could fluctuate -as is common for conditioned behavior even after it is acquired -but it could not be negative. Therefore, we determine the definitive change point as the first candidate change point for which all subsequent slopes are positive, and we report this trial as the change point (CP) in the main text. This trial corresponds to that on which consistent cued behavior first appears.
(c) Sample performance of one subject ("B") throughout training in three graphs. Gray lines indicate the transition between sessions. Top: average S+ performance index in five-trial bins (blue). Middle: cumulative S+ performance index record. Blue dots mark all of the candidate change points identified by the algorithm. The vertical red line marks the change point. Bottom. Trial by trial S+ performance (black) and average performance before and after the change point (red).   (a) Sample perievent time raster plots (top) and histograms (bottom) aligned to the time of S+ onset. Each row of graphs shows three representative neurons of the same animal, one recorded on the day before change point (left), another one recorded on the change point session (middle) and the last one recorded on the sixth day of training (right). Dots in the raster plots represent action potentials fired by the recorded neuron and trials are sorted from earliest to latest from top to bottom. Histograms were converted to firing rate using 50 ms bins. The y-axis of histograms is capped at 15 Hz to facilitate comparison across neurons. "Day" numbers refer to the training day.  (a) Population firing rate (median and interquartile range) in the 100-400 ms window after S+ (light blue) or S-(dark blue) onset by session. Numbers indicate sample size. The gray line indicates the cumulative percentage of units recorded from animals that exhibited a behavioral change point on or before that session. Post-cue firing was higher in S+ than S-trials in most sessions (*p < 0.05; **p < 0.01; ***p < 0.001, Wilcoxon).
(c) Population firing rate (median and interquartile range) in the 100-400 ms window after S+ presentation when the S+ was a tone (blue) or a light (red) in the sessions before (left, "Pre CP") or after change point (right, "Post CP"). There was no main effect of the sensory modality of the cue (Tone vs. Light; F1,145 = 0.006, p = 0.9403).
(d) Same as Fig. 2 but for the 750-2000 ms post-cue window. Starting just before behavioral change point, firing rate after S+ onset was higher than after S-onset in this window (**p < 0.01; ***p < 0.001, Wilcoxon).
(e) To test whether the firing rate of NAc neurons was elevated prior to receptacle entry in S+ trials even when the latency to enter was long, we calculated the firing rate during the pre-entry 2 s window in trials during which it took animals 5 s or more to make a receptacle entry. Starting before behavioral change point, pre-entry NAc firing rate was higher in S+ than S-trials even when the latency to enter the reward receptacle was over 5 s (*p < 0.05; **p < 0.01, Wilcoxon).
(f) Each line depicts the average firing rate of each recorded neuron in the post-cue 100-400 ms window after S+ and S-cues that subjects responded to (resp.) or missed. Units are divided into three blocks depending on whether the session in which they were recorded was before the behavioral change point ( ), the session during which the change point took place ( ) or after the change point ( ). Within each block, neurons are sorted from top to bottom in descending order according to the magnitude of their activity in the 100-400 ms post-S+ window. The legend on the right shows the correspondence between colors and firing rate values.    (a) From left to right, four heat maps represent average neuronal activity around the time of S+ onset, S+ entry, S-entry and ITI entry. Across heat maps, each line represents the same neuron. Units are divided into three blocks depending on whether the session during which they were recorded took place before the behavioral change point ( ), on the session during which the change point took place ( ) or after the change point ( ). Within each block, neurons are sorted from top to bottom in descending order according to the magnitude of their activity in the 100-400 ms post-S+ window. The legend on the right shows the correspondence between colors and firing rate values (in Z scores).
(b-c) Black dots represent each neuron's firing rate in the 100-400 ms window after S+ onset plotted against the same neuron's firing rate in the 0-1500 ms window after S-(b) or ITI (c) entry before (top) and after (bottom) behavioral change point. The regression line is shown in gray and the outliers are depicted in red . Outliers are excluded from the analyses that yielded the results shown in these graphs. Including those outliers did not substantially change the results (Supplementary Table 1). Firing rate after S-or ITI entry was not significantly correlated with S+-evoked firing rate before or after change point (p > 0.05).  (a) For animals whose arrays were not driven down after each session, comparison of average S+ performance index (***t = -. 4, p < 0.001), entry probability (** , latency (**t = , p 0.00 ) and ITI pseudolatency (t = -1. , p = 0. 9) before change point (Pre CP) vs. after change point (Post CP).
(b) When electrode arrays are not driven down in between sessions, the resulting data set includes recordings of some neurons are the same across days, and others that are not. This means that data collected across days contains a mixture of repeated and non-repeated measures. This precludes the comparison between sessions using statistical inference tests, since these tests require that observations across conditions are comprised of either repeated measures samples (within-subjects comparisons) or different samples (across-subjects comparisons). Driving the electrodes down in between sessions to sample a new population of neurons each day avoids this confound, but it also introduces a potential anatomical confound when comparing neuronal activity across sessions. In order to assess whether advancement of the probes had an effect on the learning-related increase in S+-evoked firing, we compared post-S+ firing in the group of subjects whose arrays were maintained in the same location during training with those subjects whose arrays were advanced in between sessions (Fig. 2), both before and after the change point. The graph shows firing rate (median and interquartile range) in the 400 ms post-S+ window before change point (Pre CP) and after change point (Post CP) in cueexcited neurons of rats whose arrays were driven down (blue) or not (gray) after each session. S+ evoked activity before or after the change point is similar across groups (p > 0.05, Wilcoxon).
Average activity per channel (in channels that captured firing rate from two or more units) on the day before (left) and the day after (right) behavioral change point during the 100-400 ms window after S+ (light blue) or S-(dark blue). Within-channel comparisons showed that activity evoked by the S+ was higher than activity evoked by the S-in both sessions. They also revealed that S+-evoked activity was higher on the day after behavioral change point compared to the day before behavioral change point (**p < 0.01; ***p < 0.001, Wilcoxon). These results suggest that the emergence of cueevoked excitations observed in Fig. 2 are not accounted for by the dorsoventral location of the recording electrodes.  (a) Mean±SEM entry probability during the S+ (light blue), S-(dark blue) or pre-S+ ITI window (gray) in animals that received daily bilateral AP5 injections prior to training.
(b-c) Same as '(a)' but for latency and ITI pseudolatency (b), and performance index (c).
(d) Firing rate (median and interquartile range) in the 100-400 ms window after presentation of S+ (light red) or S-(dark red) in 35-trial bins (each bin corresponds to a session) in animals that received daily bilateral AP5 injections. During the first session, activity elicited by the S-was higher than activity elicited by the S+ (**p < 0.001, Wilcoxon). Post-S+ firing was comparable to post-Sfiring in subsequent sessions (p > 0.05, Wilcoxon). Numbers indicate sample size. (a) Probability of entry during the S+ (left) or S-(right) before (Pre) or after (Post) infusion of vehicle (blue, n = 6) or AP5 (red, n = 5) in moderately trained animals. In S+ trials, entry probability was significantly diminished by microinjection of AP5 (*t = -3.504, p = 0.0248) but not vehicle (t = -0.445, p = 0.6624).
(d-f) Same as "(a-c)" but for animals that received extended (n =5) instead of moderate training. A two-factor ANOVA using drug and time as within-subject factors revealed no main or interactive effects in S+ or S-entry probability/latency and ITI pseudolatency (all effects: p > 0.05).
(g-h) Baseline firing rate before injection plotted against baseline firing rate after saline (g) or AP5 (h) injection. In both cases, the 99% confidence interval (CI) around the slope of the regression line (vehicle: 0.46-1.38; AP5: 0.59-1.06) did not significantly differ from the unity line (i.e. the confidence interval contained the value "1"), suggesting that baseline firing rate was not affected by either injection.
(i-j) Same as "g-h" but for animals that received extended training prior to the saline ( 0.92-1.16) or AP5 ( 0 4, 1.11) injection. The baseline firing rate in these animals was also unaffected by the injections.

All neurons
Baseline (-2000 to 0 ms before S+) Extinction test ("learners"). All neurons.  (a) Raw firing rate (median and interquartile range) in the 2 s window before S+ onset in the saline (blue) or AP5-treated (red) side in 35-trial bins around the trial in which the behavioral change point took place. Numbers represent the number of neurons recorded on each bin on the vehicle (blue) or the AP5-treated hemisphere (red). There was no difference in baseline firing rate across hemispheres in any of the bins (p > 0.05, Wilcoxon; Holm-Sidak adjusted).
(b) During the extinction test, earners' firing rate (median and interquartile range) in the 2 s window before S+ onset in the hemisphere that had been treated with saline (blue) or AP5-treated (red) during training. There was no difference in baseline firing rate across hemispheres during this session (p > 0.05, Wilcoxon). (a) Heat maps representing firing rate in 50 ms bins around the time of S+ entry (top), S-entry (middle) or ITI entry (bottom) in the vehicle (left) or AP5-treated side (right) of subjects that received unilateral AP5 microinjections during training. Each line on each heat map represents a neuron. Neurons are divided into two blocks depending on whether the animal learned the task during training (learner) or not (non-learner). In the learners block, neurons are further divided into three blocks: units recorded before the change point ( ), during the session in which the CP took place ( ) or after the CP ( ). Within each one of these blocks, units are sorted from top to bottom in descending order based on their average firing rate in the 0-500 ms window after the event the data is aligned to (i.e., S+ entry, S-entry or ITI entry respectively). The magnitude of the firing rate on each bin is color-coded according to the legend in the right.

Supplementary Figure 10
(b) Firing rate during the pre-entry 2 s window in the vehicle (blue) or AP5-treated side (red) in S+ trials during which it took animals 5 s or more to make a receptacle entry. Starting before behavioral change point, pre-S+-entry firing rate was higher in the vehicle than in the AP5-treated side even when the latency to enter the reward receptacle was long (*p < 0.05; ***p < 0.001, Wilcoxon).
The proportion of excited (solid) or inhibited (empty) units upon S+ entry before (left) or after (right) CP across hemispheres (vehicle: blue; AP5: red) was comparable (p > 0.05, Fisher). The magnitude of the post-S+-entry response of these units (insets: median and interquartile range) was also similar (p > 0.05, Wilcoxon).   Figure 11. Cue-evoked excitations did not emerge in the NAc core neurons of animals that failed to learn the task under daily unilateral AP5 injections ("non-learners").
(a) Individual cumulative performance index records on S+ (left) and S-(right) trials in animals that received unilateral AP5 injections and did not learn the task. Each line represents a different animal. A positive change point was not identified in their S+ performance.
, latency (c) and entry probability (d) of non-learners in 5-trial bins throughout training. S+ trials are represented in light blue, S-trials in dark blue and, in gray, the 10 s ITI window that preceded the S+.
(e) For animals that failed to learn the task, population firing rate in NAc neurons in the vehicle (left) or AP5 (right) side in S+ trials (light blue/red) and S-trials (dark blue/red) in the 100-400 ms window after the cue. S+-evoked excitations did not emerge throughout training in any of the sides (p > 0.05, Wilcoxon; Holm-Sidak adjusted).
(f) The proportion of significantly S+-excited (top) or inhibited (bottom) units in the vehicle (blue) and AP5-treated (red) side of non-learners. Throughout training, the percentage of neurons whose activity was significantly modulated by the cue did not differ across hemispheres (p > 0.05, Fisher; Holm-Sidak adjusted). Only in the last session, there was a significant increase in the percentage of cueexcited units (*p = 0.0465, Fisher; Holm-Sidak adjusted).
(g) Performance index in S+ (light blue) and S-(dark blue) trials during the drug-free extinction test in the two non-learners that were given this test.
(h) Firing rate around the time of S+ onset in 50 ms bins in the vehicle (blue) and AP5 (red) sides during the drug-free extinction test in animals that failed to learn the task during training. The inset represents the percentage of units that were excited by the S+ during the drug-free extinction test in the hemispheres that, during training, received either vehicle (blue; n = 26) or AP5 (red; n =15) injections. There were no differences in the percentage of cue-excited units across hemispheres in these animals (p = 1, Fisher).  Figure 12 Supplementary Figure 12. Anatomical location of injection and recording sites. For each experiment, diagrams of coronal sections of rat brain at different anteroposterior coordinates . In animals that received no infusions, empty blue circles mark the tips of the electrode arrays. Solid dots mark the sites where the injectors delivered saline (blue), AP5 (red) or either one depending on the session (purple).