Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Information about action outcomes differentially affects learning from self-determined versus imposed choices

Abstract

The valence of new information influences learning rates in humans: good news tends to receive more weight than bad news. We investigated this learning bias in four experiments, by systematically manipulating the source of required action (free versus forced choices), outcome contingencies (low versus high reward) and motor requirements (go versus no-go choices). Analysis of model-estimated learning rates showed that the confirmation bias in learning rates was specific to free choices, but was independent of outcome contingencies. The bias was also unaffected by the motor requirements, thus suggesting that it operates in the representational space of decisions, rather than motoric actions. Finally, model simulations revealed that learning rates estimated from the choice-confirmation model had the effect of maximizing performance across low- and high-reward environments. We therefore suggest that choice-confirmation bias may be adaptive for efficient learning of action–outcome contingencies, above and beyond fostering person-level dispositions such as self-esteem.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Schematic of the trial procedure and stimuli.
Fig. 2: Behavioural results.
Fig. 3: Parameter results of the full model from all four experiments.
Fig. 4: Model comparison results from all four experiments.
Fig. 5: Model comparison from experiment 2.
Fig. 6: Comparison of the winning model (3α) with a model with a simple perseveration parameter.
Fig. 7: Learning rate analysis and model comparison for H&L models.
Fig. 8: Valence-dependent learning biases as a function of the choice type, execution mode and outcome type.

Data availability

The data that support the findings of this study are available from the GitHub repository (https://github.com/spalminteri/agency).

Code availability

Custom code scripts have been made available on the GitHub repository (https://github.com/spalminteri/agency). Additional modified scripts can be accessed upon request.

References

  1. 1.

    Barto, A. G. & Sutton, R. S. Reinforcement Learning: An Introduction (MIT Press, 1998).

  2. 2.

    Lefebvre, G., Lebreton, M., Meyniel, F., Bourgeois-Gironde, S. & Palminteri, S. Behavioural and neural characterization of optimistic reinforcement learning. Nat. Hum. Behav. 1, 0067 (2017).

    Article  Google Scholar 

  3. 3.

    Aberg, K. C., Doell, K. C. & Schwartz, S. Linking individual learning styles to approach-avoidance motivational traits and computational aspects of reinforcement learning. PLoS ONE 11, e0166675 (2016).

    Article  Google Scholar 

  4. 4.

    Frank, M. J., Moustafa, A. A., Haughey, H. M., Curran, T. & Hutchison, K. E. Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning. Proc. Natl Acad. Sci. USA 104, 16311–16316 (2007).

    CAS  Article  Google Scholar 

  5. 5.

    Sharot, T. & Garrett, N. Forming beliefs: why valence matters. Trends Cogn. Sci. 20, 25–33 (2016).

    Article  Google Scholar 

  6. 6.

    Kuzmanovic, B. & Rigoux. L. Optimistic belief updating deviates from Bayesian learning. SSRN https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2810063 (2016).

  7. 7.

    Palminteri, S., Lefebvre, G., Kilford, E. J. & Blakemore, S. J. Confirmation bias in human reinforcement learning: evidence from counterfactual feedback processing. PLoS Comput. Biol. 138, e1005684 (2017).

    Article  Google Scholar 

  8. 8.

    Nickerson, R. S. Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2, 175–220 (1998).

    Article  Google Scholar 

  9. 9.

    Katahira, K. The statistical structures of reinforcement learning with asymmetric value updates. J. Math. Psychol. 87, 31–45 (2018).

    Article  Google Scholar 

  10. 10.

    Boureau, Y. L. & Dayan, P. Opponency revisited: competition and cooperation between dopamine and serotonin. Neuropsychopharmacology 36, 74–97 (2011).

    CAS  Article  Google Scholar 

  11. 11.

    Guitart-Masip, M. et al. Go and no-go learning in reward and punishment: interactions between affect and effect. NeuroImage 62, 154–166 (2012).

    Article  Google Scholar 

  12. 12.

    Daunizeau, J., Adam, V. & Rigoux, L. VBA: a probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput. Biol. 10, e1003441 (2014).

    Article  Google Scholar 

  13. 13.

    Correa, C. M. et al. How the level of reward awareness changes the computational and electrophysiological signatures of reinforcement learning. J. Neurosci. 38, 10338–10348 (2018).

    CAS  Article  Google Scholar 

  14. 14.

    Cazé, R. D. & van der Meer, M. A. Adaptive properties of differential learning rates for positive and negative outcomes. Biol. Cybern. 107, 711–719 (2013).

    Article  Google Scholar 

  15. 15.

    Benjamin, D. J. Errors in Probabilistic Reasoning and Judgment Biases No. w25200 (National Bureau of Economic Research, 2018).

  16. 16.

    Alicke, M. D. & Govorun, O. in The Self in Social Judgement (eds Alicke, M. et al.) 83–106 (Psychology Press, 2005).

  17. 17.

    Harris, A. J. & Osman, M. The illusion of control: a Bayesian perspective. Synthese 189, 29–38 (2012).

    Article  Google Scholar 

  18. 18.

    Ajzen, I. Perceived behavioral control, self‐efficacy, locus of control, and the theory of planned behavior. J. Appl. Soc. Psychol. 32, 665–683 (2002).

    Article  Google Scholar 

  19. 19.

    Kool, W., Getz, S. J. & Botvinick, M. M. Neural representation of reward probability: evidence from the illusion of control. J. Cogn. Neurosci. 25, 852–861 (2013).

    Article  Google Scholar 

  20. 20.

    Izuma, K. et al. Neural correlates of cognitive dissonance and choice-induced preference change. Proc. Natl Acad. Sci. USA 107, 22014–22019 (2010).

    CAS  Article  Google Scholar 

  21. 21.

    Lau, B. & Glimcher, P. W. Dynamic response‐by‐response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    Article  Google Scholar 

  22. 22.

    Gershman, S. J. Do learning rates adapt to the distribution of rewards? Psychon. Bull. Rev. 22, 1320–1327 (2015).

    Article  Google Scholar 

  23. 23.

    Findley, K. A. & Scott, M. S. Multiple dimensions of tunnel vision in criminal cases. Wis. L. Rev. 2006, 291–397 (2006).

    Google Scholar 

  24. 24.

    Rosenthal, R. & Jacobson, L. Pygmalion in the Classroom (Irvington, 1992).

  25. 25.

    Loehle, C. Hypothesis testing in ecology: psychological aspects and the importance of theory maturation. Q. Rev. Biol. 62, 397–409 (1987).

    CAS  Article  Google Scholar 

  26. 26.

    Fawcett, T. W. et al. The evolution of decision rules in complex environments. Trends Cogn. Sci. 18, 153–161 (2014).

    Article  Google Scholar 

  27. 27.

    Murayama, K. et al. How self-determined choice facilitates performance: a key role of the ventromedial prefrontal cortex. Cereb. Cortex 25, 1241–1251 (2013).

    Article  Google Scholar 

  28. 28.

    Voss, J. L., Gonsalves, B. D., Federmeier, K. D., Tranel, D. & Cohen, N. J. Hippocampal brain-network coordination during volitional exploratory behavior enhances learning. Nat. Neurosci. 14, 115–120 (2011).

    CAS  Article  Google Scholar 

  29. 29.

    Talluri, B. C., Urai, A. E., Tsetsos, K., Usher, M. & Donner, T. H. Confirmation bias through selective overweighting of choice-consistent evidence. Curr. Biol. 28, 3128–3135 (2018).

    CAS  Article  Google Scholar 

  30. 30.

    Chambon, V. et al. Neural coding of prior expectations in hierarchical intention inference. Sci. Rep. 7, 1278 (2017).

    Article  Google Scholar 

  31. 31.

    Markant, D. & Gureckis, T. Category learning through active sampling. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society (eds Ohlsson, S. & Catrambone, R.) 248–253 (Cognitive Science Society, 2010).

  32. 32.

    Xu, F. & Tenenbaum, J. B. Word learning as Bayesian inference. Psychol. Rev. 114, 245–272 (2007).

    Article  Google Scholar 

  33. 33.

    Gureckis, T. M. & Markant, D. B. Self-directed learning: a cognitive and computational perspective. Perspect. Psychol. Sci. 7, 464–481 (2012).

    Article  Google Scholar 

  34. 34.

    Leotti, L. A. & Delgado, M. R. The inherent reward of choice. Psychol. Sci. 22, 1310–1318 (2011).

    Article  Google Scholar 

  35. 35.

    Cockburn, J., Collins, A. G. & Frank, M. J. A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 83, 551–557 (2014).

    CAS  Article  Google Scholar 

  36. 36.

    Dorfman, H. M., Bhui, R., Hughes, B. L. & Gershman, S. J. Causal inference about good and bad outcomes. Psychol. Sci. 30, 516–525 (2019).

    Article  Google Scholar 

  37. 37.

    Gershman, S. J. How to never be wrong. Psychon. Bull. Rev. 26, 13–28 (2019).

    Article  Google Scholar 

  38. 38.

    Chambon, V., Thero, H., Findling, C. & Koechlin, E. Believing in one’s power: a counterfactual heuristic for goal-directed control. Preprint at bioRxiv https://doi.org/10.1101/498675 (2018).

  39. 39.

    Rotter, J. B. Social Learning and Clinical Psychology (Prentice-Hall, 1954).

  40. 40.

    Abramson, L. Y., Seligman, M. E. & Teasdale, J. D. Learned helplessness in humans: critique and reformulation. J. Abnorm. Psychol. 87, 49–74 (1978).

    CAS  Article  Google Scholar 

  41. 41.

    Palminteri, S., Khamassi, M., Joffily, M. & Coricelli, G. Contextual modulation of value signals in reward and punishment learning. Nat. Commun. 6, 8096 (2015).

    CAS  Article  Google Scholar 

  42. 42.

    Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).

  43. 43.

    Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

    CAS  Article  Google Scholar 

  44. 44.

    Palminteri, S., Wyart, V. & Koechlin, E. The importance of falsification in computational cognitive modeling. Trends Cogn. Sci. 21, 425–433 (2017).

    Article  Google Scholar 

  45. 45.

    Meyniel, F. et al. A specific role for serotonin in overcoming effort cost. eLife 5, e17282 (2016).

    Article  Google Scholar 

  46. 46.

    Schwarz, G. Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978).

    Article  Google Scholar 

Download references

Acknowledgements

V.C. was supported by the Agence Nationale de la Recherche (ANR) grants ANR-17-EURE-0017 (Frontiers in Cognition), ANR-10-IDEX-0001-02 PSL (program ‘Investissements d’Avenir’) and ANR-16-CE37-0012-01 (ANR JCJ) and ANR-19-CE37-0014-01 (ANR PRC). H.T. was supported by a PSL/ENS studentship. M.V. was supported by FIRE (‘Programme Bettencourt’) and by a Région Île-de-France studentship. P.H. was supported by the Chaire Blaise Pascal of the Région Île-de-France. S.P. was supported by an ATIP-Avenir grant (R16069JS), the Programme Emergence(s) de la Ville de Paris, the Fyssen Foundation and the Fondation Schlumberger pour l’Education et la Recherche (FSER). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

V.C., S.P. and P.H. developed the study concept. Testing and data collection were performed by H.T. and M.V. H.V. helped to write the Psychtoolbox script for data collection. Data analysis was performed by V.C., H.T., M.V. and S.P. V.C. and H.T. drafted the manuscript. S.P. and P.H. provided critical revisions. All authors approved the final version of the manuscript for submission.

Corresponding authors

Correspondence to Valérian Chambon, Héloïse Théro or Stefano Palminteri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary Handling Editor: Marike Schiffer.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Results, Supplementary Figs. 1–3, Supplementary Tables 1–3 and Supplementary References.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chambon, V., Théro, H., Vidal, M. et al. Information about action outcomes differentially affects learning from self-determined versus imposed choices. Nat Hum Behav 4, 1067–1079 (2020). https://doi.org/10.1038/s41562-020-0919-5

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing