(a) The decision boundaries for value-based decisions that maximizes the reward per unit time. The average a priori reward is varied from −1 to 8, while keeping . The inset shows two examples of prior distributions (blue and red: mean reward 0 and 4, respectively). The figure illustrates that the optimal decision boundaries depend on the prior knowledge about the average reward across trials. (b) The decision boundaries that maximizes the number of correct response per unit time for accuracy-based decisions. (The ‘correct rate’ in value-based decisions can be defined, for example, as the probability of choosing the more preferable option.) These boundaries do not vary with are all plotted on top of each other. Here, the decision boundaries were derived with the same dynamic-programming procedure as for the value-based case, except for that the rewards were assumed to be binary, and only one if the decision maker correctly identified the option with the larger ‘reward’ cue zj (see Methods section). In contrast to the reward-rate maximization strategy for value-based decisions (a), the decision strategy maximizing the correct rate is invariant to the absolute values of mean reward/evidence strength, thus demonstrating a qualitative difference between value-based and perceptual decision-making in terms of the optimal strategy. In addition, the optimal boundaries in the value-based case approach each other more rapidly over time than for perceptual decisions. The faster boundary collapse for value-based decisions is consistent across a broad range of mean absolute rewards, showing that the distinction in boundary dynamics is not just due to the difference in expected reward rates, but reflecting a qualitative difference between the geometries of value functions in these two tasks.