Figure 6: In some scenarios the optimal policy becomes even more complex than two parallel boundaries in the space of expected reward estimates. | Nature Communications

Figure 6: In some scenarios the optimal policy becomes even more complex than two parallel boundaries in the space of expected reward estimates.

From: Optimal policy for value-based decision-making

Figure 6

This property might, for example, break down if the utility that the decision maker receives from her choices is not the reward itself but instead a non-linear function of this reward. If this utility grows sub-linearly in the reward, as is frequently assumed, the decision boundaries approach each other with increasing expected reward, as higher rewards yield comparably less utility. In such circumstances, optimal choices require tracking of both expected reward estimates, and , independently rather than only their difference. To demonstrate this, here we assumed a saturating utility function, Utility=u(r), which saturates at r→∞ and r→−∞. This could be the case, for example, if rewards vary over a large range over which the subjectively perceived utility follows a non-linear saturating function of this reward. (In this figure, u is modelled with a tangent hyperbolic function, but the exact details of the functional form do not qualitatively change the results). The logic of the different panels follows that of Fig. 2. (a) The value function surface for choosing either of two options. (b) The value surfaces for postponing decision to accumulate more evidence for a period of δt. (c) The two value surfaces superimposed. (d) The decision boundary and choice represented in the two-dimensional space of . Note that the distance between decision boundaries is narrower in the regime where estimated rewards are high on average, resembling ‘RMs’11,12, which are more sensitive to absolute reward magnitudes than DDMs.

Back to article page