Striatal prediction errors support dynamic control of declarative memory decisions

Adaptive memory requires context-dependent control over how information is retrieved, evaluated and used to guide action, yet the signals that drive adjustments to memory decisions remain unknown. Here we show that prediction errors (PEs) coded by the striatum support control over memory decisions. Human participants completed a recognition memory test that incorporated biased feedback to influence participants' recognition criterion. Using model-based fMRI, we find that PEs—the deviation between the outcome and expected value of a memory decision—correlate with striatal activity and predict individuals' final criterion. Importantly, the striatal PEs are scaled relative to memory strength rather than the expected trial outcome. Follow-up experiments show that the learned recognition criterion transfers to free recall, and targeting biased feedback to experimentally manipulate the magnitude of PEs influences criterion consistent with PEs scaled relative to memory strength. This provides convergent evidence that declarative memory decisions can be regulated via striatally mediated reinforcement learning signals.

. DDM posterior predictive checks. Although model fit statistics were used to select the best-fitting model, posterior predictive checks further confirm that the best-fitting model credibly reproduces key patterns in the behavioral data. (a) Quantile plots of observed and simulated RT distributions from Experiment 1. Five RT quantiles (.1, .3, .5, .7, .9) from the empirical RT distributions are plotted for old and new responses (X's). Simulated RTs from the drift criterion DDM model (squares) and response bias DDM model (triangles) are shown for comparison. For all quantiles for both models, the respective empirical quantile value fell within the 95% credible interval of the simulated data. (b) Criterion as a function of Run, calculated for the empirical and simulated data from Experiment 1. Both models reproduce the liberal shift in criterion that occurs over the course of the experiment, and each captures the general pattern of a liberal shift that is observed in the empirical data. For each model, 500 simulated data sets were generated from the posterior distributions of model parameters, with each simulation containing 480 simulated trials for each participant. (c) Quantile plots of observed and simulated RT distributions from Experiment 2. For all quantiles for both models, the respective empirical quantile value fell within the 95% credible interval of the simulated data.  All reported clusters were significant at a False Discovery Rate (FDR)-corrected threshold of p < .05 at the cluster level. Whole-brain maps were initially thresholded at p < .001, uncorrected, and cluster corrected to p < .05 using SPM's FDR algorithm. The critical cluster extent for each contrast is listed above.
Rows that denote the peak voxel within a cluster include a value for "Number of voxels". Local maxima within the cluster reported by SPM are listed on subsequent rows.

S12
Supplementary All reported clusters were significant at a False Discovery Rate (FDR)-corrected threshold of p < .05 at the cluster level. Whole-brain maps were initially thresholded at p < .001, uncorrected, and cluster corrected to p < .05 using SPM's FDR algorithm. The critical cluster extent for each contrast is listed above.  New groups adopted a more conservative criterion and the two Old groups adopted a more liberal criterion (F(1,60) = 30.551, p < .001, η p 2 = .337). We also found evidence of incremental learning: the difference between the New groups and Old groups increased over the course of the experiment (group by time interaction, F(1.8,120) = 14.415, p < .001, η p 2 = .194).

Supplementary
To specifically assess the role of PEs, we next analyzed the effect of the targeted confidence manipulation. Note that any difference between the Low Confidence and High We note that we did not observe a statistically reliable interaction between targeted confidence and time (F(2,120) = 2.406, p = .095). Because false feedback is provided immediately from the onset of the test phase, even the criterion estimate during the initial time period reflects the influence of feedback on an individual's criterion. Indeed, in many decisionmaking and reinforcement learning tasks, learning rate is often highest during the initial portion of a learning experience. Because the experiment was double blind and employed random S20 assignment, we have no reason to assume that there were group differences in default criterion before participants began the experiment. Thus, we interpret the main effect of targeted confidence as evidence that the false feedback manipulation caused shifts in criterion from individuals' pre-experimental default criterion. However, we cannot rule out the possibility that our effect is due to a failure of random assignment (a Type I error with probability equal to our alpha level of 0.05). We did observe an interaction between targeted confidence and time for the two New groups (F(2, 60) = 5.020, p < .01) but not for the two Old groups (F < 1). This null result may be due to rapid learning in the Old -Low Confidence condition that resulted in a large criterion shift in the initial phase ( Supplementary Fig. 4). Supplementary Fig. 5a shows the correlation between Net MS-PE scores and terminal criterion for the four false feedback groups. However, this relationship also includes the group effect of targeted response: that is, it may simply be the case that because the two New groups received more positive feedback on new responses and adopted a conservative criterion (and vice versa for the Old groups), this group effect is driving the correlation (cf. Simpson's Paradox). In order to demonstrate the effect of Net MS-PEs over and above the effect of targeted response group (New / Old), we performed a multiple regression analysis that included a categorical predictor for Group (participants in the two New groups were coded as 1 and participants in the two Old groups were coded as 0) and a predictor for Net MS-PE. In this model, the regression weight for the Net MS-PE predictor represents partial correlation coefficients controlling for the effect of Group. This model provided a reliable fit to the data (R = 0.679; p < .001) and the Net MS-PE was a significant predictor of terminal criterion (p = .034). Supplementary Fig. 5b depicts a partial regression plot which shows the residuals from regressing Terminal Criterion against Group on the vertical axis against the residuals from regression Net MS-PE against Group on the horizontal axis. The correlation between these two sets of residuals shows the linear relationship (partial correlation) between Terminal Criterion and Net MS-PE in the multiple regression controlling for the effect of Group. This regression model indicates that Net S22 MS-PE predicts criterion even after controlling for the effect of group, consistent with a role of PE-based learning that occurs for both veridical and false feedback group.
Finally, we used this individual differences approach to assess whether the Net MS-PE signal provides a better fit to the data than a Net Expected Response Outcome PE (ERO-PE) signal. For each trial, we calculated the ERO-PE (see Fig. 1; Methods) and then transformed the sign of this PE signal to match the effect on criterion (as described above). For each participant, we then took the mean of the ERO-PE across all trials to compute the Net ERO-PE. We However, a manipulation check found that this approach did not lead to reliable differences in the average confidence of trials targeted for false feedback; that is, the average confidence targeted for false feedback in the High Confidence groups was not reliably different than the Low Confidence groups. Inspection of the data suggested that this manipulation failure was due to individual differences in error rates and in the way that participants used the confidence scale. For example: some participants, on average, were more likely to use the S24 portion of the confidence scale above the midpoint. If this participant were assigned to a Low Confidence group, she would not be provided many instances of false feedback. group, an old response error given confidence rating above the 17th percentile mark in her specific confidence distribution would receive false positive feedback. An old error given a confidence rating below the 17th percentile mark in her confidence distribution would receive veridical negative feedback. This would ensure that false feedback was provided with the appropriate frequency (as estimated by the script) and preferentially targeted to high but not low confidence old errors for this participant. Conversely, for a participant in the Old -Low Confidence group, an old error given a confidence rating below the 83rd percentile received false positive feedback; an old error given a confidence rating above the 83rd percentile mark received veridical negative feedback.
The feedback manipulation was successful at targeting false feedback to high or low confidence responses. The count of false feedback trials was submitted to a targeted response (Old / New) by targeted confidence (Low / High) ANOVA. The average targeted confidence for Low Confidence groups was significantly lower than for the High Confidence groups (Means = 0.26, 0.62; F(1,60) = 95.860, p < .001). The experimental script was designed to provide 30 false feedback trials to each participant; the mean number of false feedback trials was 33.1. This deviation was likely to do participants proclivity to make increased errors (due to the false feedback manipulation) over the course of the experiment, such that the experimental script initially underestimated participants' total error rates and subsequently provided additional false feedback trials.
There was a significant difference in the number of false feedback trials between the Low and High Confidence groups (F(1,60) = 6.927, p < .05) such that participants in the Low Confidence groups were provided more false feedback than the High Confidence groups (Means: Old -Low Confidence: 37.4; Old -High Confidence: 29.4; New -Low Confidence: 34.1; New S26 -High Confidence: 31.6). This is consistent with the idea that errors are generally made with lower confidence. Note that this difference alone cannot explain the pattern we report in the recognition criterion, as the larger magnitude criteria were shown by the Old -Low Confidence group and the New -High Confidence group.