Article | Published:

Behavioural evidence for parallel outcome-sensitive and outcome-insensitive Pavlovian learning systems in humans

Abstract

There is a dichotomy in instrumental conditioning between goal-directed actions and habits that are distinguishable on the basis of their relative sensitivity to changes in outcome value. It is less clear whether a similar distinction applies in Pavlovian conditioning, where responses have been found to be predominantly outcome-sensitive. To test for both devaluation-insensitive and devaluation-sensitive Pavlovian conditioning in humans, we conducted four experiments combining Pavlovian conditioning and outcome-devaluation procedures while measuring multiple conditioned responses. Our results suggest that Pavlovian conditioning involves two distinct types of learning: one that learns the current value of the outcome, which is sensitive to devaluation, and one that learns about the spatial localization of the outcome, which is insensitive to devaluation. Our findings have implications for the mechanistic understanding of Pavlovian conditioning and provide a more nuanced understanding of Pavlovian mechanisms that might contribute to a number of psychiatric disorders.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

Code used to generate the figures and the results of the four studies reported in this manuscript is available through the Open Science Framework repository: https://osf.io/rve2p/

Data availability

Data from the four studies reported in this manuscript are available through the Open Science Framework repository: https://osf.io/rve2p/

References

  1. 1.

    Berridge, K. C. & Robinson, T. E. Liking, wanting, and the incentive-sensitization theory of addiction. Am. Psychol. 71, 670–679 (2016).

  2. 2.

    Everitt, B. J. & Robbins, T. W. Drug addiction: updating actions to habits to compulsions ten years on. Annu. Rev. Psychol. 67, 23–50 (2016).

  3. 3.

    Voon, V. et al. Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatr. 20, 342–352 (2015).

  4. 4.

    Anderson, B. A. The attention habit: how reward learning shapes attentional selection. Ann. NY Acad. Sci. 1369, 24–39 (2016).

  5. 5.

    Eder, A. B. & Dignath, D. Cue-elicited food seeking is eliminated with aversive outcomes following outcome devaluation. Q. J. Exp. Psychol. 69, 574–588 (2016).

  6. 6.

    Nadler, N., Delgado, M. R. & Delamater, A. R. Pavlovian to instrumental transfer of control in a human learning task. Emotion 11, 1112–1123 (2011).

  7. 7.

    Pool, E. R., Brosch, T., Delplanque, S. & Sander, D. Where is the chocolate? Rapid spatial orienting toward stimuli associated with primary rewards. Cognition 130, 348–359 (2014).

  8. 8.

    Prévost, C., McNamee, D., Jessup, R. K., Bossaerts, P. & O’Doherty, J. P. Evidence for model-based computations in the human amygdala during Pavlovian conditioning. PLoS Comput. Biol. 9, e1002918 (2013).

  9. 9.

    Rangel, A., Camerer, C. & Montague, P. R. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 (2008).

  10. 10.

    Sali, A. W., Anderson, B. A. & Yantis, S. The role of reward prediction in the control of attention. J. Exp. Psychol. Human 40, 1654–1664 (2014).

  11. 11.

    O’Doherty, J. P., Cockburn, J. & Pauli, W. M. Learning, reward, and decision making. Annu. Rev. Psychol. 68, 73–100 (2017).

  12. 12.

    Delamater, A. R. & Oakeshott, S. Learning about multiple attributes of reward in Pavlovian conditioning. Ann. NY Acad. Sci. 1104, 1–20 (2007).

  13. 13.

    Balleine, B. W. & Killcross, S. Parallel incentive processing: an integrated view of amygdala function. Trends Neurosci. 29, 272–279 (2006).

  14. 14.

    Hatfield, T., Han, J.-S., Conley, M., Gallagher, M. & Holland, P. Neurotoxic lesions of basolateral, but not central, amygdala interfere with Pavlovian second-order conditioning and reinforcer devaluation effects. J. Neurosci. 16, 5256–5265 (1996).

  15. 15.

    Holland, P. C. & Straub, J. J. Differential effects of two ways of devaluing the unconditioned stimulus after Pavlovian appetitive conditioning. J. Exp. Psychol. Anim. B. 5, 65–78 (1979).

  16. 16.

    Ramachandran, R. & Pearce, J. M. Pavlovian analysis of interactions between hunger and thirst. J. Exp. Psychol. Anim. B. 13, 182–192 (1987).

  17. 17.

    Gottfried, J. A., O’Doherty, J. P. & Dolan, R. J. Encoding predictive reward value in human amygdala and orbitofrontal cortex. Science 301, 1104–1107 (2003).

  18. 18.

    Holland, P. C., Lasseter, H. & Agarwal, I. Amount of training and cue-evoked taste-reactivity responding in reinforcer devaluation. J. Exp. Psychol. Anim. B. 34, 119–132 (2008).

  19. 19.

    Robinson, M. J. F. & Berridge, K. C. Instant transformation of learned repulsion into motivational “wanting”. Curr. Biol. 23, 282–289 (2013).

  20. 20.

    Pearce, J. M. & Hall, G. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).

  21. 21.

    Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory (eds. Black, A.H. & Prokasy, W.F.) 64–99 (Appleton Century Crofts, New York, 1972).

  22. 22.

    Sutton, R. S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).

  23. 23.

    Sharpe, M. J. & Schoenbaum, G. Evaluation of the hypothesis that phasic dopamine constitutes a cached-value signal. Neurobiol. Learn. Mem. (2017).

  24. 24.

    Cardinal, R. N., Parkinson, J. A., Hall, J. & Everitt, B. J. Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci. Biobehav. Rev. 26, 321–352 (2002).

  25. 25.

    Delamater, A. R. On the nature of CS and US representations in Pavlovian learning. Learn. Behav. 40, 1–23 (2012).

  26. 26.

    Nasser, H. M., Chen, Y.-W., Fiscella, K. & Calu, D. J. Individual variability in behavioral flexibility predicts sign-tracking tendency. Front. Behav. Neurosci. 9, 289 (2015).

  27. 27.

    Ahrens, A. M., Singer, B. F., Fitzpatrick, C. J., Morrow, J. D. & Robinson, T. E. Rats that sign-track are resistant to Pavlovian but not instrumental extinction. Behav. Brain. Res. 296, 418–430 (2016).

  28. 28.

    Morrison, S. E., Bamkole, M. A. & Nicola, S. M. Sign tracking, but not goal tracking, is resistant to outcome devaluation. Front. Neurosci. 9, 468 (2015).

  29. 29.

    Zhang, S., Mano, H., Ganesh, G., Robbins, T. & Seymour, B. Dissociable learning processes underlie human pain conditioning. Curr. Biol. 26, 52–58 (2016).

  30. 30.

    Pauli, W. M. et al. Distinct contributions of ventromedial and dorsolateral subregions of the human substantia nigra to appetitive and aversive learning. J. Neurosci. 35, 14220–14233 (2015).

  31. 31.

    Seymour, B., Daw, N. D., Dayan, P., Singer, T. & Dolan, R. J. Differential encoding of losses and gains in the human striatum. J. Neurosci. 27, 4826–4831 (2007).

  32. 32.

    Tricomi, E., Balleine, B. W. & O’Doherty, J. P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).

  33. 33.

    Dickinson, A., Campos, J., Varga, Z. I. & Balleine, B. Bidirectional instrumental conditioning. Q. J. Exp. Psychol. B 49, 289–306 (1996).

  34. 34.

    Grindley, G. C. The formation of a simple habit in guinea-pigs. B. J. Psychol. Gen. Sect. 23, 127–147 (2011).

  35. 35.

    Hershberger, W. A. An approach through the looking-glass. Anim. Learn. Behav. 14, 443–451 (1986).

  36. 36.

    Guitart-Masip, M. et al. Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. J. Neurosci. 31, 7867–7875 (2011).

  37. 37.

    Adams, C. D. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol.-B 34, 77–98 (1982).

  38. 38.

    Balleine, B. W. & Dickinson, A. Instrumental performance following reinforcer devaluation depends upon incentive learning. Q. J. Exp. Psychol. B 43, 279–296 (1991).

  39. 39.

    Valentin, V. V., Dickinson, A. & O’Doherty, J. P. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27, 4019–4026 (2007).

  40. 40.

    Balleine, B. W. & O’Doherty, J. P. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010).

  41. 41.

    Pauli, W. M., Cockburn, J., Pool, E. R., Pérez, O. D. & O’Doherty, J. P. Computational approaches to habits in a model-free world. Curr. Opin. Behav. Sci. 20, 104–109 (2018).

  42. 42.

    Dickinson, A., Balleine, B. W., Watt, A., Gonzalez, F. & Boakes, R. A. Motivational control after extended instrumental training. Anim. Learn. Behav. 23, 197–206 (1995).

  43. 43.

    Guitart-Masip, M., Duzel, E., Dolan, R. J. & Dayan, P. Action versus valence in decision making. Trends. Cogn. Sci. 18, 194–202 (2014).

  44. 44.

    Guitart-Masip, M. et al. Go and no-go learning in reward and punishment: interactions between affect and effect. Neuroimage 62, 154–166 (2012).

  45. 45.

    Konorski, J. Integrative Activity of the Brain: An Interdisciplinary Approach (Univ. Chicago, Chicago, 1967).

  46. 46.

    De Tommaso, M., Mastropasqua, T. & Turatto, M. Working for beverages without being thirsty: human Pavlovian-instrumental transfer despite outcome devaluation. Learn. Motiv. 63, 37–48 (2018).

  47. 47.

    Holland, P. C. The effects of satiation after first- and second-order appetitive conditioning in rats. Pavlovian J. Biol. Sci. 16, 18–24 (1981).

  48. 48.

    Holland, P. C. & Rescorla, R. A. The effect of two ways of devaluing the unconditioned stimulus after first-and second-order appetitive conditioning. J. Exp. Psychol. Anim. B. 1, 355–363 (1975).

  49. 49.

    Dayan, P. & Berridge, K. C. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. Cogn. Affect. Behav. Neurosci. 14, 473–492 (2014).

  50. 50.

    Daw, N. D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1707–1711 (2005).

  51. 51.

    Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. G* Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).

  52. 52.

    Krassanakis, V., Filippakopoulou, V. & Nakos, B. EyeMMV toolbox: an eye movement post-analysis tool based on a two-step spatial dispersion threshold for fixation identification. J. Eye Movement Res. https://doi.org/10.16910/jemr.7.1.1 (2014).

  53. 53.

    Choe, K. W., Blake, R. & Lee, S.-H. Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation. Vision Res. 118, 48–59 (2016).

  54. 54.

    Nyström, M., Hooge, I. & Andersson, R. Pupil size influences the eye-tracker signal during saccades. Vision Res. 121, 95–103 (2016).

  55. 55.

    Reber, J. et al. Selective impairment of goal-directed decision-making following lesions to the human ventromedial prefrontal cortex. Brain 140, 1743–1756 (2017).

  56. 56.

    Prévost, C., Liljeholm, M., Tyszka, J. M. & O’Doherty, J. P. Neural correlates of specific and general Pavlovian-to-instrumental transfer within human amygdalar subregions: a high-resolution fMRI study. J. Neurosci. 32, 8383–8390 (2012).

Download references

Acknowledgements

This work was supported by a NIDA-NIH R01 grant (1R01DA040011-01A1) to J.P.O. and W.M.P. and by an Early Postdoctoral Mobility fellowship from the Swiss National Science Foundation (P2GEP1162079) to E.R.P. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The authors thank O. D. Perez and V. Sennwald for insightful comments on this manuscript.

Author information

E.R.P., W.M.P., C.S.K. and J.P.O. designed the experiments. E.R.P. and C.S.K. collected and analysed the data. E.R.P., W.M.P., C.S.K. and J.P.O. wrote the paper. All authors discussed the results and implications and commented on the manuscript at all stages.

Correspondence to Eva R. Pool.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods and Supplementary Notes.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Schematic representation of the experimental design.
Fig. 2: Effect of conditioning during the learning phase of Experiment 1.
Fig. 3: Manipulation check of the outcome-devaluation procedure.
Fig. 4: Effects of the outcome-devaluation procedure on different conditioned responses during Experiment 1 and Experiment 2.
Fig. 5: Illustration of the sequence of events in a trial for Experiment 3.
Fig. 6: Illustration of the main effects during Experiment 3.
Fig. 7: Effect of conditioning during Experiment 4.