Behavioural and neural characterization of optimistic reinforcement learning

Abstract

When forming and updating beliefs about future life outcomes, people tend to consider good news and to disregard bad news. This tendency is assumed to support the optimism bias. Whether this learning bias is specific to ‘high-level’ abstract belief update or a particular expression of a more general ‘low-level’ reinforcement learning process is unknown. Here we report evidence in favour of the second hypothesis. In a simple instrumental learning task, participants incorporated better-than-expected outcomes at a higher rate than worse-than-expected ones. In addition, functional imaging indicated that inter-individual difference in the expression of optimistic update corresponds to enhanced prediction error signalling in the reward circuitry. Our results constitute a step towards the understanding of the genesis of optimism bias at the neurocomputational level.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Behavioural task and variables.
Figure 2: Behavioural and computational identification of optimistic reinforcement learning.
Figure 3: Functional signatures of the optimistic reinforcement learning.
Figure 4: Robustness of optimistic reinforcement learning.

References

  1. 1

    Burtt, E. A. The English Philosophers from Bacon to Mill (Modern Library, 1939).

  2. 2

    Weinstein, N. D. Unrealistic optimism about future life events. J. Pers. Soc. Psychol. 39, 806–820 (1980).

  3. 3

    Shepperd, J. A., Klein, W. M. P., Waters, E. A. & Weinstein, N. D. Taking stock of unrealistic optimism. Perspect. Psychol. Sci. 8, 395–411 (2013).

  4. 4

    Shepperd, J. A., Waters, E. A., Weinstein, N. D. & Klein, W. M. P. A primer on unrealistic optimism. Curr. Dir. Psychol. Sci. 24, 232–237 (2015).

  5. 5

    Shepperd, J. A., Ouellette, J. A. & Fernandez, J. K. Abandoning unrealistic optimism: performance estimates and the temporal proximity of self-relevant feedback. J. Pers. Soc. Psychol. 70, 844–855 (1996).

  6. 6

    Waters, E. A. et al. Correlates of unrealistic risk beliefs in a nationally representative sample. J. Behav. Med. 34, 225–235 (2011).

  7. 7

    Schoenbaum, M. Do smokers understand the mortality effects of smoking? Evidence from the health and retirement survey. Am. J. Public Health 87, 755–759 (1997).

  8. 8

    Sharot, T., Korn, C. W. & Dolan, R. J. How unrealistic optimism is maintained in the face of reality. Nat. Neurosci. 14, 1475–1479 (2011).

  9. 9

    Eil, D. & Rao, J. M. The good news–bad news effect: asymmetric processing of objective information about yourself. Am. Econ. J. Microecon. 3, 114–138 (2011).

  10. 10

    Sharot, T. & Garrett, N. Forming beliefs: why valence matters. Trends Cogn. Sci. 20, 25–33 (2016).

  11. 11

    Sharot, T., Riccardi, A. M., Raio, C. M. & Phelps, E. A. Neural mechanisms mediating optimism bias. Nature 450, 102–105 (2007).

  12. 12

    Moutsiana, C. et al. Human development of the ability to learn from bad news. Proc. Natl Acad. Sci. USA 110, 16396–16401 (2013).

  13. 13

    Garrett, N. et al. Losing the rose tinted glasses: neural substrates of unbiased belief updating in depression. Front. Hum. Neurosci. 8, 639 (2014).

  14. 14

    Moutsiana, C., Charpentier, C. J., Garrett, N., Cohen, M. X. & Sharot, T. Human frontal-subcortical circuit and asymmetric belief updating. J. Neurosci. 35, 14077–14085 (2015).

  15. 15

    Garrison, J., Erdeniz, B. & Done, J. Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies. Neurosci. Biobehav. Rev. 37, 1297–1310 (2013).

  16. 16

    Worbe, Y. et al. Reinforcement learning and Gilles de la Tourette syndrome: dissociation of clinical phenotypes and pharmacological treatments. Arch. Gen. Psychiatry 68, 1257–1266 (2011).

  17. 17

    Palminteri, S., Boraud, T., Lafargue, G., Dubois, B. & Pessiglione, M. Brain hemispheres selectively track the expected value of contralateral options. J. Neurosci. 29, 13465–13472 (2009).

  18. 18

    Palminteri, S. et al. Critical roles for anterior insula and dorsal striatum in punishment-based avoidance learning. Neuron 76, 998–1009 (2012).

  19. 19

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).

  20. 20

    Rescorla, R. A. & Wagner, A. R. in Classical Conditioning: Current Research and Theory 64–99 (Appleton Century Crofts, 1972).

  21. 21

    Daunizeau, J., Adam, V. & Rigoux, L. VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput. Biol. 10, e1003441 (2014).

  22. 22

    O’Doherty, J. P., Hampton, A. & Kim, H. Model-based fMRI and its application to reward learning and decision making. Ann. N. Y. Acad. Sci. 1104, 35–53 (2007).

  23. 23

    Shah, P., Harris, A. J. L., Bird, G., Catmur, C. & Hahn, U. A pessimistic view of optimistic belief updating. Cogn. Psychol. 90, 71–127 (2016).

  24. 24

    Sharot, T. & Garrett, N. The myth of a pessimistic view of optimistic belief updating — a commentary on Shah et al. Preprint at http://dx.doi.org/10.2139/ssrn.2811752 (2016).

  25. 25

    Doll, B. B., Hutchison, K. E. & Frank, M. J. Dopaminergic genes predict individual differences in susceptibility to confirmation bias. J. Neurosci. 31, 6188–6198 (2011).

  26. 26

    Niv, Y. et al. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015).

  27. 27

    Sharot, T., Guitart-Masip, M., Korn, C. W., Chowdhury, R. & Dolan, R. J. How dopamine enhances an optimism bias in humans. Curr. Biol. 22, 1477–1481 (2012).

  28. 28

    Kahneman, D. & Tversky, A. Prospect theory: an analysis of decision under risk. Econometrica 47, 263–292 (1979).

  29. 29

    Huys, Q. J. M., Maia, T. V & Frank, M. J. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat. Neurosci. 19, 404–413 (2016).

  30. 30

    Sharot, T. The Optimism Bias: Why We’re Wired to Look on the Bright Side (Robinson, 2012).

  31. 31

    Voltaire . Candide, or Optimism (Penguin, 2013).

  32. 32

    Gifford, R. The dragons of inaction: pychological barriers that limit climate change mitigation and adaptation. Am. Psychol. 66, 290–302 (2011).

  33. 33

    Sharot, T., Guitart-Masip, M., Korn, C. W., Chowdhury, R. & Dolan, R. J. How dopamine enhances an optimism bias in humans. Curr. Biol. 22, 1477–1481 (2012).

  34. 34

    Kuzmanovic, B., Jefferson, A. & Vogeley, K. The role of the neural reward circuitry in self-referential optimistic belief updates. Neuroimage 133, 151–162 (2016).

  35. 35

    Skvortsova, V., Palminteri, S. & Pessiglione, M. Learning to minimize efforts versus maximizing rewards: computational principles and neural correlates. J. Neurosci. 34, 15621–15630 (2014).

  36. 36

    Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).

  37. 37

    Domenech, P. & Koechlin, E. Executive control and decision-making in the prefrontal cortex. Curr. Opin. Behav. Sci. 1, 101–106 (2015).

  38. 38

    Kolling, N., Behrens, T. E. J., Wittmann, M. K. & Rushworth, M. F. S. Multiple signals in anterior cingulate cortex. Curr. Opin. Neurobiol. 37, 36–43 (2016).

  39. 39

    Mathys, C., Daunizeau, J., Friston, K. J. & Stephan, K. E. A Bayesian foundation for individual learning under uncertainty. Front. Hum. Neurosci. 5, 39 (2011).

  40. 40

    Lebreton, M., Abitbol, R., Daunizeau, J. & Pessiglione, M. Automatic integration of confidence in the brain valuation signal. Nat. Neurosci. 18, 1159–1167 (2015).

  41. 41

    Hampton, A. N., Bossaerts, P. & O’Doherty, J. P. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26, 8360–8367 (2006).

  42. 42

    van Den Bos, W., Cohen, M. X., Kahnt, T. & Crone, E. A. Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. Cereb. Cortex 22, 1247–1255 (2012).

  43. 43

    Frank, M. J., Moustafa, A. A, Haughey, H. M., Curran, T. & Hutchison, K. E. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl Acad. Sci. USA 104, 16311–16316 (2007).

  44. 44

    Niv, Y., Edlund, J. A., Dayan, P. & O’Doherty, J. P. Neural prediction errors reveal a risk-sensitive reinforcement-learning process in the human brain. J. Neurosci. 32, 551–562 (2012).

  45. 45

    Carver, C. S., Scheier, M. F. & Segerstrom, S. C. Optimism. Clin. Psychol. Rev. 30, 879–889 (2010).

  46. 46

    Tindle, H. A. et al. Optimism, cynical hostility, and incident coronary heart disease and mortality in the Women’s Health Initiative. Circulation 120, 656–662 (2009).

  47. 47

    Macleod, A. K. & Conway, C. Well-being and the anticipation of future positive experiences: the role of income, social networks, and planning ability. Cogn. Emot. 19, 357–374 (2005).

  48. 48

    Johnson, D. D. P. & Fowler, J. H. The evolution of overconfidence. Nature 477, 317–320 (2011).

  49. 49

    Cazé, R. D. & van der Meer, M. A. A. Adaptive properties of differential learning rates for positive and negative outcomes. Biol. Cybern. 107, 711–719 (2013).

  50. 50

    Raafat, R. M., Chater, N. & Frith, C. Herding in humans. Trends Cogn. Sci. 13, 420–428 (2009).

  51. 51

    Hills, T. T., Todd, P. M., Lazer, D., Redish, A. D. & Couzin, I. D. Exploration versus exploitation in space, mind, and society. Trends Cogn. Sci. 19, 46–54 (2015).

  52. 52

    Palminteri, S., Khamassi, M., Joffily, M. & Coricelli, G. Contextual modulation of value signals in reward and punishment learning. Nat. Commun. 6, 8096 (2015).

  53. 53

    Popper, K. The Logic of Scientific Discovery (Routledge, 2005).

  54. 54

    Dienes, Z. Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference (Palgrave Macmillan, 2008).

  55. 55

    Lebreton, M. & Palminteri, S. Assessing inter-individual variability in brain-behavior relationship with functional neuroimaging. Preprint at bioRxivhttp://dx.doi.org/10.1101/036772 (2016).

Download references

Acknowledgements

We thank Y. Worbe and M. Pessiglione for granting access to the first dataset, V. Wyart, B. Bahrami and B. Kuzmanovic for comments, and T. Sharot and N. Garrett for providing activation masks. S.P. was supported by a Marie Sklodowska-Curie Individual European Fellowship (PIEF-GA-2012 Grant 328822) and is currently supported by an ATIP-Avenir grant (R16069JS). G.L. was supported by a PhD fellowship of the Ministère de l'enseignement supérieur et de la recherche. M.L. was supported by an EU Marie Sklodowska-Curie Individual Fellowship (IF-2015 Grant 657904) and acknowledges the support of the Bettencourt-Schueller Foundation. The second experiment was supported by the ANR-ORA, NESSHI 2010–2015 research grant to S.B.-G. The Institut d’Étude de la Cognition is supported by the LabEx IEC (ANR-10-LABX-0087 IEC) and the IDEX PSL* (ANR-10-IDEX-0001-02 PSL*). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

G.L. performed the experiment, analysed the data and wrote the manuscript. M.L. provided analytical tools, interpreted the results and edited the manuscript. F.M. provided analytical tools and edited the manuscript. S.B.-G. interpreted the results and edited the manuscript. S.P. designed the study, performed the experiments, analysed the data and wrote the manuscript.

Corresponding author

Correspondence to Stefano Palminteri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Information

Supplementary Notes, Supplementary Figures 1–7, Supplementary Discussion, Supplementary Methods, Supplementary References. (PDF 606 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lefebvre, G., Lebreton, M., Meyniel, F. et al. Behavioural and neural characterization of optimistic reinforcement learning. Nat Hum Behav 1, 0067 (2017). https://doi.org/10.1038/s41562-017-0067

Download citation

Further reading