Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans

We aim to differentiate the brain regions involved in the learning and encoding of Pavlovian associations sensitive to changes in outcome value from those that are not sensitive to such changes by combining a learning task with outcome devaluation, eye-tracking, and functional magnetic resonance imaging in humans. Contrary to theoretical expectation, voxels correlating with reward prediction errors in the ventral striatum and subgenual cingulate appear to be sensitive to devaluation. Moreover, regions encoding state prediction errors appear to be devaluation insensitive. We can also distinguish regions encoding predictions about outcome taste identity from predictions about expected spatial location. Regions encoding predictions about taste identity seem devaluation sensitive while those encoding predictions about an outcome’s spatial location seem devaluation insensitive. These findings suggest the existence of multiple and distinct associative mechanisms in the brain and help identify putative neural correlates for the parallel expression of both devaluation sensitive and insensitive conditioned behaviors.

Major points: 1.I find the framing of the study in the introduction somewhat problematic.The clear dichotomy between model-based and model-free works well for behavior, and using devaluation to tease them apart is perfectly fine.But I am not sure this can be directly translated to neural responses.It certainly works well for reward PEs: Devaluation insensitive reward PEs means they are computed based on cached values, whereas devaluation sensitive reward PE indicates that they are computed using model-based/inferred value estimates (e.g., Sadacca et al 2016, eLife).In contrast, it does not work so well for predicted reward identity (or side) or state PEs.There are many instances where you wouldn't expect a model-based signal to be sensitive to devaluation.For example, assuming state PEs are a model-based signal that supports learning of a model that may be used for model-based inference, you would not expect this signal to be devaluation sensitive.You want this signal to construct a model of the world as it is and not only based on your current state.Similarly, not all predictive representations of reward identity should be devaluation sensitive because we need an invariant model of the world to do model-based inference (we should not update the model that stimulus A predicts chocolate just because we ate a bunch or chocolate).So ideally, both of these model-based signals should not be devaluation sensitive.On the other hand, we expect other model-based signals to be devaluation sensitive.For instance, devaluation sensitive representations of predicted reward identity would suggest that these signals integrate the identity and the inferred value of the expected outcome -the output of model-base inference.My point is that there is no clear a priori reason for specific model-based representations in the brain to be devaluation sensitive or not.However, knowing which signals are sensitive and which are not is important and helps us to better understand the role of these brain areas in behavior.The introduction covers a lot of ground that is only partially relevant to the current question.I think the paper would be improved if it focused more on the key questions and made explained what different results would mean for our understanding of these brain areas in Pavlovian learning.3: I think most readers would appreciate an explanation of the rationale for testing BOLD responses to omitted outcomes in extinction to determine neural responses to reward and state PE.How can we interpret these results?Also, only the response difference (valued (CS+ -CS-) minus devalued (CS+ -CS-)) are shown, but it would be interesting to see the responses (CS+ minus CS-) individually for devalued and devalued omitted outcomes.

Figure
3. The current study examines two different state-based (value-neutral) signals: identity and side.It is very possible that they are not devaluation sensitive to the same degree.In fact, only reward identity is affected by the current devaluation procedure, and so you could expect differences in how signals for side and identity are modulated by devaluation.The decoding analysis based on CS-evoked fMRI responses distinguishes between identity and side, but I did not see a similar analysis for state PEs.Examining differences between identity and location PE, and how they change with devaluation would be very interesting.
4. The results section (page 13) would benefit from a description of "the unexpected side effects and the unexpected identity effects as measured with reaction times" before using them for the correlation analysis with the state-PE responses.Something like the last section of the methods section would be helpful.Also, from which areas were the state-PE betas (shown in Figure 4) extracted?Please describe this in the main text and clarify that these analyses focus on data from the acquisition phase.5.I am not sure I fully understand the logic of comparing pre-post differences in CS-evoked univariate responses between devalued and non-devalued rewards in areas that show significant decoding for side and reward identity.I'd have expected to see a comparison of decoding accuracy for reward identity or side between pre-and post-devaluation, though I can imagine that this is challenging given the limited data.
6. What is the distribution of the fitted learning rates from the two models?Are they correlated across subjects?Also, how well did the two models (with best fitting learning rates) explain the pupil data?Assuming identical learning rates, how different are the trial-by trial estimates of V_FW and V_RW?I also don't fully understand equation ( 9).What values can 'R' take on the left and right hand side of the equation?Where do the values for E[R/US] come from in this model?Is this an indicator function that produces 1 for rewarding US and 0 for neutral US or are these objective probabilities from the task design?
Minor comments 1. Please clarify in the main text that responses in Figure 3A and C were based on data from the acquisition phase of the task.
2. Was subjects' food intake prior to the experiment controlled?How much did subjects eat (in calories) during the devaluation procedure?3. Did pleasantness ratings differ between the two food identities?And if so, does this introduce a confound in the analysis?4.There were 60 trials per conditioning run.How many trials for each of the five CSs and for each of the three deviant outcomes per CS?What were the deviant outcome for the neutral CS? 5. Figure 4: If I understand this plot correctly, correlations of state PEs with both behavioral measures (location and identity) are shown in panel A, whereas panel B shows correlations with reward PE.If this is correct, then please correct the main text on the bottom of page 13 to reflect this (this text currently suggests that correlations with state PE for identity and location effects are shown in panel A and B, respectively.
6.The authors are aware that the side decoding analysis is confounded by motor responses and eye movements.Please describe in the methods section how exactly the maps from the side decoding analysis were masked to remove signals related to motor responses and eye movements.

7.
Training and testing the classifier on data from the same runs (even when training and testing on different trials) may introduce biases.I assume the authors used this approach to maximize the amount of training data.However, it would be reassuring if they could show that the basic results hold up when training and testing the classifier on trials from different runs.Perhaps this control analysis could be performed in the ROIs from the original analysis.8.I was not clear what the authors meant by "concatenated" when describing the GLMs.First, I assumed that they appended data from all runs to estimate a single parameter estimate for a given condition for all runs.But this would not work for the analysis described on page 42, which separates responses pre-and post-devaluation.Do the authors mean that the GLM included data from multiple runs but each condition was still modeled with different regressors to obtain independent parameter estimates?If the latter is the case (which the default in SPM), I would suggest removing the term concatenation as it is typically used to refer to what is done using spm_fmri_concatenate. 9. Please check the reference list for typos.
Reviewer #2 (Remarks to the Author): In this manuscript authors aimed to differentiate brain areas involved in learning and encoding associations that are sensitive to changes in the value of an outcome from those that are not sensitive to such changes.They find that, regions whose activity correlates with reward prediction errors are sensitive to outcome devaluation, challenging the assumption that reward prediction errors are exclusively model-free.Similarly, brain areas correlating with state prediction errors were found to be less sensitive to outcome devaluation, challenging the assumption that state prediction errors are model-based.
Such findings appear noteworthy and of broad interest.The manuscript is clearly written, the methodology is sound and the conclusions are supported by the data.I only have a few minor suggestions that should be considered before publication.
-p. 9-10 "outcome devaluation induced changes": what about RTs, did outcome devaluation have any effect on RTs at Test? -p.28 "the video appeared on the left or right white frame", what appeared on the other frame?Fig. 1a seems to show that the background of the clip appears on the other frame, please add this to the description.Is this background also presented in "no outcome" trials (p.29)?You should specify this.
-p. 28 please specify that participants needed to press a button corresponding to the side of outcome delivery, and that RTs were collected.