Behavioral and neurophysiological correlates of regret in rat decision-making on a neuroeconomic task

Journal name:
Nature Neuroscience
Volume:
17,
Pages:
995–1002
Year published:
DOI:
doi:10.1038/nn.3740
Received
Accepted
Published online

Abstract

Disappointment entails the recognition that one did not get the value expected. In contrast, regret entails recognition that an alternative (counterfactual) action would have produced a more valued outcome. In humans, the orbitofrontal cortex is active during expressions of regret, and humans with damage to the orbitofrontal cortex do not express regret. In rats and nonhuman primates, both the orbitofrontal cortex and the ventral striatum have been implicated in reward computations. We recorded neural ensembles from orbitofrontal cortex and ventral striatum in rats encountering wait or skip choices for delayed delivery of different flavors using an economic framework. Economically, encountering a high-cost choice after skipping a low-cost choice should induce regret. In these situations, rats looked backwards toward the lost option, cells within orbitofrontal cortex and ventral striatum represented the missed action, rats were more likely to wait for the long delay, and rats rushed through eating the food after that delay.

At a glance

Figures

  1. Restaurant Row and revealed preferences in rats.
    Figure 1: Restaurant Row and revealed preferences in rats.

    (a) The Restaurant Row task consisted of a central ring with four connected spokes leading to individual food flavors. Rats ran counterclockwise around the ring, encountering the four invisible zones (square boxes) sequentially. Color reflects flavor: pink, cherry; yellow, banana; black, unflavored (plain); brown, chocolate. (be) Rats typically waited through short delays but skipped long delays. Each panel shows the stay or go decisions for all encounters of a single rat running a single session (R210-2011-02-02). A small vertical jitter has been added for display purposes. Thresholds were fit as described in the Online Methods. (fi) Rats R210 (f), R222 (g), R231 (h) and R234 (i) each demonstrated a different revealed preference that was consistent within a rat across all sessions but differed among rats. Thresholds were fit for each flavor for each session. Each panel shows the mean fit threshold for a given rat, with s.e.m. over sessions. An important consideration is to control for the possibility that rats were waiting for a specific cue before leaving the zone.

  2. Ensembles in OFC and vStr represent the current reward and the current zone.
    Figure 2: Ensembles in OFC and vStr represent the current reward and the current zone.

    (a,b) p(reward) at each reward for OFC (a) and vStr (b), defining the training set for decoding as activity at reward delivery and the test set as activity at each moment surrounding reward delivery (shaded area, s.e.m.). The neural ensemble decoded the current reward reliably (distribution of current reward was determined to be significantly different, empirical cumulative distribution function, significant at α = 0.05). p(reward) is the posterior probability indicating the likelihood of representing a given reward flavor as calculated by the Bayesian decoding. (c) For a,b, the training set is the reward types and the test set is activity when the rat receives reward. Rat icon indicates that decoding aligned to reward delivery (when the rat is already at feeders). Filled-circle feeder locations indicate that the training set for the decoder is based on responses to reward delivery. Dashed lines indicate zone location. (d,e) p(zone) at each zone for OFC (d) and vStr (e), defining the training set for decoding as neuronal activity at zone entry and the test set as neuronal activity at each moment surrounding zone entry. The neural ensemble decoded the current zone reliably. p(zone) is the posterior probability indicating representation of a given zone entry as calculated by Bayesian decoding. (f) For d,e, the training set is zone entry and the test set is neuronal activity when the rat enters the zone, triggering the cue that signals the delay. Rat icon indicates that decoding is aligned to zone entry. Solid box indicates that the training set for the decoder is based on responses to zone entry. Open circles indicate reward locations. (g,h) p(reward) at each zone for OFC (g) and vStr (h), defining the training set for decoding as neuronal activity at reward delivery and the test set as neuronal activity at each moment surrounding zone entry. The neural ensemble at time of zone entry decoded the current reward type reliably. (i) For g,h the training set is the reward flavor and the test set is neuronal activity when the rat enters the zone, triggering the cue (tone). Rat icon indicates that decoding is aligned to zone-entry, as in f. Filled circles indicate that the training set is based on responses to reward-delivery, as in c. Dashed lines indicate zone location.

  3. Representations of expected reward as a function of delay and threshold.
    Figure 3: Representations of expected reward as a function of delay and threshold.

    To determine whether OFC and vStr signals predicted behavior at time of zone-entry, we measured p(reward) at each zone for all offers above and below the threshold for a given rat for a given flavor-reward site (shaded area, s.e.m.). (a,b) Low-cost offers in which the rat waited through the delay (distribution of current reward was determined to be significantly different, empirical cumulative distribution function, significant at α = 0.05). (c,d) High-cost offers in which the rat skipped out and did not wait through the full delay. (a,c) OFC. (b,d) vStr. (e) This decoding operation was based on a training set at the reward but a test set at zone entry.

  4. Behavioral responses in regret-inducing and control situations.
    Figure 4: Behavioral responses in regret-inducing and control situations.

    All passes were rotated so as to align on entry into a current zone. Orientation was measured using the curvature measure as per the Online Methods. (ac) Examples of approaches for each of the three conditions: regret-inducing, control 1 (same sequence but rat took previous option) and control 2 (two long delays in a row). Gray dots show all behavioral tracking samples from the example session. Blue dots show the current path taken in each example. The colors of the arrows correspond to the matching circular vector plots. Arrow directions indicate empirically determined curvature direction. In a regret-inducing example (a), when the rat entered the zone, he paused and looked backwards toward the previous zone. In a control 1 example (b), the rat looked toward the current reward spoke but proceeded on to the next zone. In a control 2 example (c), the rat looked toward the next zone but turned back toward the current reward. (df) Summary statistics. The first reorientation event was measured as per the Online Methods. Gray traces show all pausing reorientations over all instances in that condition. Heavy line shows vector average in each 120° arc. In regret-inducing conditions (d), rats tended to orient toward the previous zone or current spoke. In control 1 conditions (e), rats tended to orient only toward the current spoke. In control 2 conditions (f), rats tended to orient toward the next zone. The distributions in df were significantly different from each other (Watson's circular U; see text).

  5. Single reward cells in OFC and vStr during regret-inducing situations.
    Figure 5: Single reward cells in OFC and vStr during regret-inducing situations.

    Top: OFC example cell during regret-inducing situation. Gray dots represent individual spikes. Solid colored lines indicate Gaussian-smoothed activity, Gaussian width σ = 50 ms. Black, unflavored pellets; pink, cherry flavored; yellow, banana flavored; brown, chocolate flavored. Black dots in the center panel represent position of the animal in this example lap during this instance. Red dots show position of the animal when the cell in question fired spikes. The rat traveled in a counterclockwise direction. The maze has been aligned so that the current zone is represented by the bottom right zone. This particular cell responded most to entry into the cherry reward zone, little to entry into the banana reward zone. When the rat skipped a low-cost cherry zone opportunity and encountered a high-cost banana zone opportunity, the rat looked back toward the previous reward, and the activity of the cell approximated that of the cherry zone-entry response. Bottom (display same as top panel): vStr example cell during a regret-inducing situation after skipping the chocolate reward zone and arriving at the cherry reward zone.

  6. Neural representations in OFC and vStr represent the previous zone during behavioral regret instances.
    Figure 6: Neural representations in OFC and vStr represent the previous zone during behavioral regret instances.

    (a,b) In regret-inducing conditions, the p(zone) representation of the previous encounter was high after zone entry into the current zone for both OFC (a) and vStr (b) (shaded areas, s.e.m.). Green traces show decoding using shuffled inter-stimulus intervals. Decoding to the previous zone was significantly different from all other conditions, even after controlling for multiple comparisons (ANOVA; OFC, P << 0.001; vStr, P << 0.001; distribution significantly different as determined by empirical cumulative distribution function, significant at α = 0.05). (c) The conditions being decoded in a,b: the rat has skipped the previous offer, even though the delay was less than threshold for that restaurant, and has now encountered a delay greater than threshold for the current restaurant. (df) In the control 1 condition, the p(zone) representation of the current zone increased until the rat heard the cue indicating a long delay, at which time the representation changed to reflect the next zone. In control 1, p(zone) representations of the current and next zones were significantly different from the other zones (ANOVA; vStr, P << 0.001; OFC, P << 0.001), although they were not different from each other after controlling for multiple comparisons (ANOVA; vStr, P = 0.074; OFC, P = 0.619). OFC (d), vStr (e) and cartoon indicating condition (f). (gi) In the control 2 condition, the p(zone) representations of both the current and previous zones increased when the rat heard the cue indicating a long delay (compared to other zones, ANOVA; OFC, P << 0.001; vStr, P << 0.001). OFC (g), vStr (h) and cartoon indicating condition (i). Decodings to the current and previous zones in control 2 were not significantly different from each other (ANOVA; OFC, P = 0.509; vStr, P = 0.268).

  7. Behavioral changes following potential regret instances.
    Figure 7: Behavioral changes following potential regret instances.

    (a) Comparing the proportion of stays to skips during each condition revealed that rats were more willing to wait for a reward following regret-inducing instances than control 1 instances (Wilcoxon, *P = 0.01) or control 2 instances (Wilcoxon, P = 0.06). (b) Rats spent less time consuming reward during regret than during non-regret instances. Typical handling time mean = 25.3 s, s.d. = 12.2 s; regret handling time mean = 15.2 s, s.d. 14.2 s. Control handling times were distributed the same as all non-regret handling times.

  8. Behavioral and neurophysiological correspondences during regret.
    Figure 8: Behavioral and neurophysiological correspondences during regret.

    To determine whether the representations of previous reward were different when the rat chose to stay at the high-delay (high-cost) current zone, we measured the ratio between the p(zone) representation of the previous zone against the p(zone) representation of the current zone from 0 to 3 s following zone entry for all conditions in the event that the rat skipped or stayed. Each panel shows a box plot of the distribution of p(zoneprevious)/p(zonecurrent) ratios divided between stays and skips. Box limits are 25th and 75th percentiles, whiskers extend to data not considered outliers and outliers are plotted separately. (a) p(zoneprevious)/p(zonecurrent) ratios from OFC ensembles during regret-inducing conditions. (b) p(zoneprevious)/p(zonecurrent) ratios from vStr ensembles during regret-inducing conditions. (c,d) During control 1 conditions. (e,f) During control 2 conditions. Following regret inducing instances, when rats were more willing to wait for reward, p(zoneprevious) was greater than p(zonecurrent).

References

  1. Coricelli, G., Dolan, R.J. & Sirigu, A. Brain, emotion and decision making: the paradigmatic example of regret. Trends Cogn. Sci. 11, 258265 (2007).
  2. Camille, N. et al. The involvement of the orbitofrontal cortex in the experience of regret. Science 304, 11671170 (2004).
  3. Gilovich, T. & Medvec, V.H. The experience of regret: what, when, and why. Psychol. Rev. 102, 379395 (1995).
  4. Bell, D. Regret in decision making under uncertainty. Oper. Res. 30, 961981 (1982).
  5. Landman, J. & Manis, J.D. What might have been: counterfactual thought concerning personal decisions. Br. J. Psychol. 83, 473477 (1992).
  6. Loomes, G. & Sugden, R. Regret theory: an alternative theory of rational choice under uncertainty. Econ. J. 92, 805824 (1982).
  7. Loomes, G. & Sugden, R. Disappointment and dynamic consistency in choice under uncertainty. Rev. Econ. Stud. 53, 271282 (1986).
  8. Bell, D. Disappointment in decision making under uncertainty. Oper. Res. 33, 127 (1985).
  9. Landman, J. Regret: a theoretical and conceptual analysis. J. Theory Soc. Behav. 17, 135160 (1987).
  10. Lee, D. Neural basis of quasi-rational decision making. Curr. Opin. Neurobiol. 16, 191198 (2006).
  11. Coricelli, G. et al. Regret and its avoidance: a neuroimaging study of choice behavior. Nat. Neurosci. 8, 12551262 (2005).
  12. Schoenbaum, G., Chiba, A.A. & Gallagher, M. Orbitofrontal cortex and basolateral amygdala encode expected outcomes during learning. Nat. Neurosci. 1, 155159 (1998).
  13. Tremblay, L. & Schultz, W. Relative reward preference in primate orbitofrontal cortex. Nature 398, 704708 (1999).
  14. Padoa-Schioppa, C. & Assad, J.A. Neurons in the orbitofrontal cortex encode economic value. Nature 441, 223226 (2006).
  15. Jones, J.L. et al. Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science 338, 953956 (2012).
  16. Takahashi, Y.K. et al. Neural estimates of imagined outcomes in the orbitofrontal cortex drive behavior and learning. Neuron 80, 507518 (2013).
  17. Wilson, R.C., Takahashi, Y.K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267279 (2014).
  18. Sul, J.H., Kim, H., Huh, N., Lee, D. & Jung, M.W. Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making. Neuron 66, 449460 (2010).
  19. Abe, H. & Lee, D. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70, 731741 (2011).
  20. Schoenbaum, G., Nugent, L.S., Saddoris, M.P. & Setlow, B. Orbitofrontal lesions in rats impair reversal but not acquisition of go, no-go odor discriminations. Neuroreport 13, 885890 (2002).
  21. Fellows, L.K. & Farah, M.J. Ventromedial frontal cortex mediates affective shifting in humans: evidence from a reversal learning paradigm. Brain 126, 18301837 (2003).
  22. Rudebeck, P.H., Saunders, R.C., Prescott, A.T., Chau, L.S. & Murray, E.A. Prefrontal mechanisms of behavioral flexibility, emotion regulation and value updating. Nat. Neurosci. 16, 11401145 (2013).
  23. McDannald, M.A., Lucantonio, F., Burke, K.A., Niv, Y. & Schoenbaum, G. Ventral striatum and orbitofrontal cortex are both required for model-based, but not model-free, reinforcement learning. J. Neurosci. 31, 27002705 (2011).
  24. McDannald, M.A. et al. Model-based learning and the contribution of the orbitofrontal cortex to the model-free world. Eur. J. Neurosci. 35, 991996 (2012).
  25. Steiner, A.P. & Redish, A.D. The road not taken: neural correlates of decision making in orbitofrontal cortex. Front. Neurosci. 6, 131 (2012).
  26. Cromwell, H.C. & Schultz, W. Effects of expectations for different reward magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89, 28232838 (2003).
  27. O'Doherty, J. et al. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452454 (2004).
  28. Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E. & Schoenbaum, G. Ventral striatal neurons encode the value of the chosen action in rats deciding between differently delayed or sized rewards. J. Neurosci. 29, 1336513376 (2009).
  29. van der Meer, M.A. & Redish, A. Covert expectation-of-reward in rat ventral striatum at decision points. Front. Integr. Neurosci 3, 115 (2009).
  30. Lavoie, A.M. & Mizumori, S.J. Spatial, movement- and reward-sensitive discharge by medial ventral striatum neurons of rats. Brain Res. 638, 157168 (1994).
  31. Setlow, B., Schoenbaum, G. & Gallagher, M. Neural encoding in ventral striatum during olfactory discrimination learning. Neuron 38, 625636 (2003).
  32. Takahashi, Y.K. et al. The orbitofrontal cortex and ventral tegmental area are necessary for learning from unexpected outcomes. Neuron 62, 269280 (2009).
  33. Takahashi, Y.K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 15901597 (2011).
  34. Schoenbaum, G. & Eichenbaum, H. Information coding in the rodent prefrontal cortex. I. Single-neuron activity in orbitofrontal cortex compared with that in pyriform cortex. J. Neurophysiol. 74, 733750 (1995).
  35. Roitman, M.F., Wheeler, R.A. & Carelli, R.M. Nucleus accumbens neurons are innately tuned for rewarding and aversive taste stimuli, encode their predictors, and are linked to motor output. Neuron 45, 587597 (2005).
  36. Landman, J. Regret: The Persistence of the Possible (Oxford Univ. Press, 1993).
  37. Connolly, T. & Butler, D. Regret in economic and psychological theories of choice. J. Behav. Decis. Mak. 19, 139154 (2006).
  38. Aronson, E. The effect of effort on the attractiveness of rewarded and unrewarded stimuli. J. Abnorm. Soc. Psychol. 63, 375380 (1961).
  39. Arkes, H.R. & Ayton, P. The sunk cost and Concorde effects: are humans less rational than lower animals? Psychol. Bull. 125, 591 (1999).
  40. Hare, T.A., Camerer, C.F. & Rangel, A. Self-control in decision-making involves modulation of the vmPFC valuation system. Science 324, 646648 (2009).
  41. Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 12921298 (2010).
  42. Bray, S., Shimojo, S. & O'Doherty, J.P. Human medial orbitofrontal cortex is recruited during experience of imagined and real rewards. J. Neurophysiol. 103, 25062512 (2010).
  43. Walton, M.E., Behrens, T.E., Buckley, M.J., Rudebeck, P.H. & Rushworth, M.F. Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron 65, 927939 (2010).
  44. Noonan, M.P. et al. Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proc. Natl. Acad. Sci. USA 107, 2054720552 (2010).
  45. Rushworth, M.F., Noonan, M.P., Boorman, E.D., Walton, M.E. & Behrens, T.E. Frontal cortex and reward-guided learning and decision-making. Neuron 70, 10541069 (2011).
  46. Carmichael, S.T. & Price, J.L. Limbic connections of the orbital and medial prefrontal cortex in macaque monkeys. J. Comp. Neurol. 363, 615641 (1995).
  47. Carmichael, S.T. & Price, J.L. Sensory and premotor connections of the orbital and medial prefrontal cortex of macaque monkeys. J. Comp. Neurol. 363, 642664 (1995).
  48. Ongür, D. & Price, J.L. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb. Cortex 10, 206219 (2000).
  49. Schilman, E.A., Uylings, H.B., Galis-de Graaf, Y., Joel, D. & Groenewegen, H.J. The orbital cortex in rats topographically projects to central parts of the caudate-putamen complex. Neurosci. Lett. 432, 4045 (2008).
  50. Mailly, P., Aliane, V., Groenewegen, H.J., Haber, S.N. & Deniau, J.M. The rat prefrontostriatal system analyzed in 3D: evidence for multiple interacting functional units. J. Neurosci. 33, 57185727 (2013).
  51. Janabi-Sharifi, F., Hayward, V. & Chen, C.S.J. Discrete-time adaptive windowing for velocity estimation. IEEE Trans. Control Syst. Technol. 8, 10031009 (2000).
  52. Hart, W.E., Goldbaum, M., Cote, B., Kube, P. & Nelson, M.R. Measurement and classification of retinal vascular tortuosity. Int. J. Med. Inform. 53, 239252 (1999).
  53. Zhang, K., Ginzburg, I., McNaughton, B.L. & Sejnowski, T.J. Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. J. Neurophysiol. 79, 10171044 (1998).

Download references

Author information

Affiliations

  1. Graduate Program in Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA.

    • Adam P Steiner
  2. Department of Neuroscience, University of Minnesota, Minneapolis, Minnesota, USA.

    • A David Redish

Contributions

A.P.S. and A.D.R. conducted the experiments, collected the data, performed the analysis and wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (24.2 MB)

    Supplementary Figures 1–17 and Supplementary Tables 1 and 2

Additional data