A Pavlovian account for paradoxical effects of motivation on controlling response vigour

Oudiette, Delphine; Vinckier, Fabien; Bioud, Emmanuelle; Pessiglione, Mathias

doi:10.1038/s41598-019-43936-7

Download PDF

Article
Open access
Published: 20 May 2019

A Pavlovian account for paradoxical effects of motivation on controlling response vigour

Delphine Oudiette^1,2,
Fabien Vinckier^1,3,
Emmanuelle Bioud¹ &
…
Mathias Pessiglione¹

Scientific Reports volume 9, Article number: 7607 (2019) Cite this article

2072 Accesses
6 Citations
12 Altmetric
Metrics details

Subjects

Abstract

In high stakes situations, people sometimes choke under pressure, performing below their abilities. Here, we suggest a novel mechanism to account for this paradoxical effect of motivation: the automatic adjustment of action vigour to potential reward. Although adaptive on average, this mechanism may impede fine motor control. Such detrimental effect was observed in three studies (n = 74 in total), using behavioural tasks where payoff depended on the precision of handgrip squeezing or golf putting. Participants produced more force for higher incentives, which aggravated their systematic overshooting of low-force targets. This reward bias was specific to action vigour, as reward did not alter action timing, direction or variability across trials. Although participants could report their reward bias, they somehow failed to limit their produced force. Such an automatic link between incentive and force level might correspond to a Pavlovian response that is counterproductive when action vigour is not instrumental for maximizing reward.

Interactive effects of incentive value and valence on the performance of discrete action sequences

Article Open access 29 April 2021

Tyler J. Adkins, Bradley S. Gary & Taraz G. Lee

Controllability governs the balance between Pavlovian and instrumental action selection

Article Open access 20 December 2019

Hayley M. Dorfman & Samuel J. Gershman

The Differential Impact of a Response’s Effectiveness and its Monetary Value on Response-Selection

Article Open access 25 February 2020

Noam Karsh, Eitan Hemed, … Baruch Eitam

Introduction

During a professional golf tournament, a famous champion once missed a hole that was only one meter away on the green. ‘Even I can do it,’ said someone in the audience. The champion held the club out to the amateur and challenged him: ‘you show us’. The amateur took up the challenge and … executed a perfect putt. ‘But this is not the real game,’ said the champion. ‘I want you to try again, and if you succeed, you can pocket this one-million-dollar check.’ The amateur tried, and missed.

As illustrated in this legendary anecdote, our skills sometimes fail us when we need them the most. This is a paradoxical effect of motivation because performance should scale with expected reward, if the behaviour were to follow on rational principles. Indeed, optimal control theory, which accounts well for behavioural performance in a variety of tasks, posits that control resources are allocated so as to minimize costs and maximize benefits^1,2,3. More control should, therefore, be exerted when more money is at stake, leading to a better performance. More precisely, greater control should be exerted when the instrumentality of performance (i.e., in economic terms, its marginal utility) is higher, meaning when each unit of performance has a higher impact on the outcome⁴.

However, paradoxical effects of monetary incentives have been reported not only in sports but also in academic and business settings^5,6. These paradoxical effects have been assimilated to a phenomenon known as ‘choking under pressure’, whose broad sense is performing sub-optimally when desire to perform well is maximal. After the seminal observation of an inverted U-shaped relationship between learning performance and the importance of potential outcomes⁷, choking has been interpreted as arising from hyper-arousal or over-motivation. These notions are compatible with two neuroimaging studies that have linked performance decrements for high incentives with activity in reward circuits (ventral striatum and midbrain area) and with individual measures of either financial motivation to gain money or loss aversion^8,9. However, hyper-arousal and over-motivation remain quite vague as cognitive explanations of pressure effects.

Two specific cognitive theories were later proposed and extended to include the detrimental effect of any external stressor, typically the presence of an audience (for reviews see^{10,11,12,13,14}). The distraction theory assumes that pressure directs attention towards task-irrelevant thoughts and worries, which reduces the amount of attentional control devoted to the task and thus degrades performance. The explicit monitoring theory assumes the opposite: pressure triggers excessive attentional control, which prevents the actor from using previously optimized procedural routines and thus disrupts performance. Both theories have received empirical support and are considered as valid explanations of pressure effects on various motor and cognitive skills, including golf putts, tennis serves, basketball free throws, working memory, mathematical problem solving, etc.^{15,16,17,18,19,20,21,22}. However, they do not specify which aspect of motor or cognitive processes is disrupted under pressure.

Here, we suggest a more specific explanation for the detrimental effect of increasing incentives. Our idea is that potential rewards trigger direct specifications of behaviour, which may antagonize correct performance. This idea follows on the notion of Pavlovian behaviour, meaning an automatic response to a specific cue that bypasses a proper cost-benefit analysis of the situation^23,24,25. Pavlovian responses are considered to be evolutionarily ancient and hard-wired in brain circuits. Even in species with a well-developed prefrontal cortex, such as primates, they can offer a good compromise between shortening deliberation time and adjusting behaviour to the context. Indeed, Pavlovian responses are supposed to be adapted on average, at least for the environment in which they were naturally selected. The persistence of Pavlovian responses is therefore key to understanding human behaviour and why it may be maladaptive in certain situations of the modern world.

Specifically, we hypothesized that incentives energize behaviour, in the sense that potential rewards boost action vigour, which can be measured as force in Newtons. This may not sound novel, as the link between expected reward and physical effort allocation is both intuitive and well supported by empirical studies (for recent reviews see^26,27,28). What we add here is that the link is somewhat automatic: more physical effort may be allocated to action with higher expected reward, even when doing so is not instrumental (i.e. when it does not bring more reward). A Pavlovian account also makes a prediction about when pressure effects are adaptive: in situations where reward magnitude or probability increases with action vigour. One may speculate that this sort of situation was frequent in ancient evolutionary times – when, for instance, running faster would increase the probability of catching a prey.

In previous studies, we have explored this sort of adaptive situation using a task where monetary reward is proportional to handgrip force. Specifically, the proportion of the monetary incentives that participants win corresponds to the percentage of the maximal force that they produce. The standard behaviour in this task is to produce more force for higher incentives, with incentives being varied on a trial-by-trial basis (e.g.^29,30,31). To test the situation where action vigour is not instrumental, we simply turned the force task into a precision task by changing the payoff rule: the eventual reward is now proportional to the distance between the produced force and a target force. Thus, payoff is orthogonal to action vigour: in order to maximize reward, force must not be too low or too high. An automatic link between incentive and force would even be detrimental in cases where targets tend to be overshot, and adaptive in the opposite case. The critical prediction is therefore that increasing incentives should aggravate the overshooting of low-force targets, leading to a paradoxical effect of motivation on controlling action vigour.

To be honest, we initially observed this link, hereafter termed ‘reward bias’, in two experiments that were designed for other purposes³² (Bioud et al., unpublished data), and which are presented here as supplementary information. In the following, we report three replication studies, where we first validate the concept of a reward bias, then isolate incentive effects on force from potential effects on timing and direction, and finally generalize the phenomenon to a more ecological paradigm (golf putting).

Results

For all three experiments presented in the main text, participants were trained at a motor precision task (hand gripping or golf putting), first with online visual feedback, then with offline feedback only, and finally without any feedback on performance, as in the test blocks (see Fig. 1). The training trials enabled participants calibrating the motor command required to hit the target. In test trials, they only had proprioceptive feedback on their hand movement, but no visual feedback on the distance to target. Of note, when provided with a visual feedback, participants were on target at every trial, because they could correct their initial force pulse. Suppressing visual feedback was therefore crucial to get a non-trivial task where participants make errors. Between test blocks, they had a short retraining session with performance feedback again, to maintain calibration throughout the experiment.

Critically, participants’ degree of motivation was manipulated in test blocks by varying the amount of reward at stake (i.e., the maximal incentive) on a trial-by-trial basis (6 levels ranging from 5c to 50€). Participants were instructed that their payoff for each trial would be inversely proportional to the distance from target. Thus, they would win the full reward if they were exactly on target and nothing if they were at maximal distance (i.e., staying on start position or reaching a symmetrical position on the other side of target). In a subset of trials, they were also asked to position their shot with respect to target, in order to get a subjective estimation of their performance, and thereby assess their knowledge of potential bias.

Experiment 1

In experiment 1, 24 participants performed a motor precision task (Fig. 2A) which consisted in squeezing a handgrip such that peak force would be as close as possible to a given target force level that was either low (20% of MVC), medium (45% of MVC), or high (70% of MVC). Trial-by-trial series of peak forces were analyzed with a simple linear model that included a constant (to assess the offset of motor performance), the incentive level (to assess the effect of motivation) and the trial index (to assess the effect of fatigue). This model was fitted separately for the different target forces. The same analysis was applied to peak force estimates that participants provided in one third of trials.

The motor bias: participants overshot low target force and undershot high target force

To assess motor performance we compared the constant parameters (irrespective of incentive level and trial index) with target force level. Participants overshot low targets (22.7 ± 0.99 vs. 20%, t(23) = 2.7; p = 0.012) and undershot high targets (63.9 ± 1.4 vs. 70%, t(23) = −4.5, p = 0.0002). They were on target only for medium level (45.9 ± 0.94 vs. 45%, t(23) = 0.95; p = 0.35). Thus, the low and high targets seemed more difficult to hit, possibly because small and big forces are harder to produce (see Fig. 2B). To test this idea, we presented target forces two by two and asked participants which they would choose if they had to do one more session of testing: participants preferred low to medium targets (73 vs. 28%) medium to high targets (68 vs. 32%), and low to high targets (82 vs. 18%). Such preference pattern suggests that perceived cost was related to physical effort more than task difficulty.

The reward bias: participants squeezed harder for higher rewards, even for low targets

To assess motivation effects we compared the regression weight on incentive level to zero. For every target we found a significantly positive effect on force production (target = 20% MVC, β_Incentive = 0.52 ± 0.13, t(23) = 4.0, p = 0.0006; target = 45%, β_Incentive = 0.99 ± 0.27, t(23) = 3.7, p = 0.0012, target = 70%: β_Incentive = 1.4 ± 0.32, t(23) = 4.2, p = 0.003). Thus, the higher the prospective incentive, the harder participants squeezed the handgrip, even when it was detrimental to do so, such as when target force was low (see Fig. 2B). This reward bias was larger for high targets compared to low ones (high vs. low target, t(23) = 2.4, p = 0.022, high vs. medium target, t(23) = 0.86, p = 0.39; medium vs. low target, t(23) = 1.6, p = 0.13).

The decalibration: the motor bias was aggravated with time on task

To assess fatigue effects we compared the regression weight on trial index to zero. Trial index was initiated at the beginning of each block of 18 trials. Across trials (but within blocks), participants overshot more and more the low targets (target = 20%, β_Trial = 0.62 ± 0.21, t(23) = 3.0, p = 0.007) and undershot more and more the high targets (target = 70%, β_Trial = −1.4 ± 0.49, t(23) = −2.9, p = 0.0078). There was a non-significant trend for an increase in force over trials for medium targets (target = 45%, β_Trial = 0.79 ± 0.39, t(23) = 2.04, p = 0.054). Note that between blocks, participants were recalibrated with visual feedback allowing comparison of performance to target force. Thus, decalibration is not due to muscular fatigue, which would globally decrease force, but to mere forgetting. Indeed, in the absence of feedback, participants regressed to the middle of force range, as if the mapping between motor command and target force had to be learned again.

Subjective estimation: a reward bias but no overshoot

A similar linear regression on force estimates revealed that participants were aware of the reward bias: they reported higher force production for higher incentives, irrespective of target level (target = 20%, t(18) = 3.08, p = 0.0065; target = 45%, t(18) = 3.0, p = 0.0076; target = 70%, t(18) = 4.1, p = 0.0007). However, participants seemed not fully aware that they were overshooting low targets: as shown by the comparison of intercepts, estimated force was significantly below produced force (target = 20%, 19.9 ± 0.57% vs. 22.2 ± 0.98%, t(18) = −2.89; p = 0.0096). There was no significant difference between estimated and produced force for medium and high targets. Furthermore, force estimates were not significantly modulated by trial index, whatever the target force.

Control of variability: no effect of incentive motivation

In this precision task, motivation induced by potential reward may be expected to improve control of grip force, and therefore to decrease the variability of forces produced across trials. To test this possibility, we simply regressed inter-trial variance against incentive level. We found no significant effect (target = 20%, t(23) = 1.3, p = 0.19; target = 45%, t(23) = −1.5, p = 0.14; target = 70%, t(23) = −1.04, p = 0.31). This suggests that higher incentives increased the amount of force produced but did not reduce its variability, which could have mitigated the detrimental impact of the reward bias on performance.

Experiment 2

In experiment 2, we tested for the specificity of the reward effect, by adding time as a second dimension that participants had to control. The question was whether the reward bias would also be reflected in response time, meaning that people would go not only stronger but also faster for bigger reward. Thus, 26 participants performed a variant of the motor precision task (Fig. 3A), in which they had to squeeze a handgrip such that the peak would not only hit a predefined target force level (20%, 45% or 70% of MVC as in experiment 1), but also do it at the right time (0.55 s, 1.1 s or 2.2 s after the go signal). Fast and slow targets (0.55 s and 2.2 s) were only tested with the medium target force (45% of MVC). So comparison of force targets was only done for the 1.1 s timing, and comparison of time targets for the 45% force. Peak force and time were analyzed separately for the different targets, as in experiment 1, with a simple linear model that included incentive level and trial index. The same analysis was applied to perceived peak force and time, which participants indicated in all trials.

The motor bias

Participants overshot not only low force targets (27.5 ± 1.4% vs. 20%, t(25) = 5.5; p < 0.0001) but also medium force targets (48.4 ± 1.6% vs. 45%, t(25) = 2.2, p = 0.039), and had a non-significant tendency to undershoot high force targets (67.1 ± 1.9% vs. 70%, t(25) = 1.2, p = 0.14). The overshoot of low targets was even larger than in the previous experiment (22.7 ± 0.99% in Exp 1 vs. 27.5 ± 1.4% in Exp2; t = −2.9; p = 0.0066), possibly because the demand for controlling a second dimension acted as a stressor or distractor. Regarding timing, participants were slightly slower than required for all time targets (fast target: 0.59 ± 0.007 s vs. 0.55 s, t(25) = 3.4, p = 0.0024; medium target: 1.15 ± 0.008 s vs. 1.1 s, t(25) = 2.7, p = 0.014; slow target: 2.27 ± 0.017 s vs. 2.2 s, t(25) = 2.9, p = 0.008).

The reward bias

Similarly to experiment 1, we found that higher rewards predicted higher force (see Fig. 3B), for every target (target = 20%, β_Incentive = 0.51 ± 0.14, t(25) = 3.7, p = 0.001; target = 45%, β_Incentive = 0.82 ± 0.14, t(25) = 6.1, p < 0.0001; target = 70%, β_Incentive = 1.2 ± 0.24, t(25) = 5.0, p < 0.0001). This reward bias was not observed in the time dimension, except for fast targets (target = 0.55 s, β_Incentive = −0.0084 ± 0.0011, t(25) = −2.3, p = 0.029; target = 1.1 s; β_Incentive = −0.0055 ± 0.0025, t(25) = −1.5, p = 0.15; target = 2.2 s, β_Incentive = 0.0074 ± 0.0037, t(25) = 1.3, p = 0.22). However, contrary to the reward bias observed in force level, the slight speeding effect observed with fast targets was not detrimental, as it led timing closer to that required.

Decalibration

As seen in experiment 1, the overshooting of low targets was aggravated with trials without feedback (target = 20%, β_Trial = 0.54 ± 0.21; t(25) = 2.6; p = 0.017), as was the undershooting of high targets (target = 70%, β_Trial = −1.80 ± 0.53; t(25) = −3.3, p = 0.0027). This corresponds to the decalibration effect already observed in experiment 1. The tendency to squeeze slower than required was even amplified with the number of trials for all targets (target = 0.55 s, β_Trial = 0.012 ± 0.0016, t(25) = 2.7; p = 0.013; target = 1.1 s, β_Trial = 0.026 ± 0.0023, t(25) = 4.2, p = 0.0002; target = 2.2 s, β_Trial = 0.04 ± 0.0041, t(25) = 4.1, p = 0.0004). Contrary to the decalibration effect observed on force level, this can be distinguished from a regression to the average target, and may arise from physical fatigue, which would slow down force production. Alternatively, the general slowing could mean that participants had a natural tendency to underestimate duration, which was partially corrected by training.

Subjective estimates

We found again a reward bias in force estimates but only for low and high targets (target = 20%, β_Incentive = 0.50 ± 0.18; t(25) = 2.8, p = 0.0092; target = 45%, β_Incentive = 0.2 ± 0.15, t(25) = 1.4, p = 0.19; target = 70%, β_Incentive = 0.49 ± 0.23, t(25) = 2.2, p = 0.039). As in experiment 1, reported force was lower than produced force and thus closer to the target, indicating that participants underestimated the overshoot of low target (target = 20%, 23.1 ± 0.93% vs. 27.5 ± 1.4%, t(25) = −3.2; p = 0.0037).

Similarly, time estimates were shorter than actual times, and closer to the targets (target = 0.55 s, 0.55 ± 0.0065 vs. 0.59 ± 0.0065 s, t(25) = −2.5, p = 0.018; target = 1.1 s, 1.10 ± 0.008 vs.1.15 ± 0.008, t(25) = −2.6, p = 0.014; target = 2.2 s, 2.20 ± 0.077 vs. 2.27 ± 0.017, t(25) = −2.6 p = 0.014).

Control of variability

Consistently with experiment 1, we found no significant effect of reward on force variability (target = 20%, t(25) = 0.76, p = 0.46; target = 45%, t(25) = 1.0, p = 0.33; target = 70%, t(25) = 0.44, p = 0.66), suggesting again that higher incentives increased the amount of force produced but not the quality of motor control. Similarly, incentive level had no significant effect on time variability (target = 0.55 s, t(25) = −1.5, p = 0.14; target = 1.1 s, t(25) = −0.47, p = 0.65; target = 2.2 s, t(25) = −1.7, p = 0.095).

Experiment 3

In the first two experiments, we found that participants produced more force when potential reward was higher - an effect that was detrimental when small force was required (20% MVC target). However, this detrimental reward bias was linked to the overshoot of low targets, which could be due to the presence of higher targets in the same experiments. In other words, overshooting could arise from a regression to the mean target, because participants might forget about the mapping between proprioceptive input and the different required force levels, as shown with the decalibration effect. The first aim of experiment 3 was to probe reward bias and overshoot in a situation where only one single fixed target was presented during the entire experiment, to avoid any confusion that could happen when different target levels are presented within the same experiment. The second aim was to disentangle between the force dimension and the precision of movement, which were confounded in previous experiments. The idea was to test whether during more complex gesture, excessive force would translate in a general disorganization of movement control. The last aim was to assess whether our effects of interest would occur in a more ecological situation.

We therefore employed a golf simulator to examine putting movement, in another sample of 24 participants (see Fig. 4A). The distance to the hole was kept constant (7.6 m) and potential rewards were varied as in previous experiments. Our dependent variables were the distance travelled by the ball (an indirect measure of the force exerted on the ball with the club) and the (unsigned) angle deviation from the straight line to the hole (a measure of direction accuracy). In all trials, participants also had to estimate the endpoint of their ball trajectory, with respect to the hole. This estimate was decomposed into perceived distance and angle. As before, actual force and direction were analyzed separately with a linear model including incentive level and trial number. The same linear model was fitted to perceived force and direction.

The motor bias

Just like with the low target force in the first two experiments, the model intercept showed that participants tend to overshoot the hole on average, independently of incentive level, but this effect was not significant (7.9 ± 0.19 vs. 7.6 m, t(23) = 1.6, p = 0.13, see Fig. 4B).

The reward bias

Consistently with the first two experiments, higher force was exerted for higher incentive level, such that a longer distance was covered by the ball (β_Incentive = 0.07, t(23) = 2.6, p = 0.016). Because we noticed a slight non-linearity, we tested the possibility of a quadratic relationship between distance and incentive level, by squaring the (z-scored) incentive regressor. The quadratic effect of incentive level was not significant (t(23) = 0.23, p = 0.82), discarding an inverted U-shaped relationship between performance and incentive level. Moreover, when we included both the linear and quadratic functions of incentive level in the same GLM, the linear link was still significant (t = 2.6, p = 0.018), but the quadratic link was not (t = 0.29, p = 0.18). In contrast to the effect on distance, incentive level had no significant influence on direction accuracy, i.e. the unsigned angle between ball trajectory and straight line (GLM, β_Incentive = −0.029, t(23) = −0.47, p = 0.64).

Decalibration

The overshoot was aggravated with time on task: participants sent the ball further away from the hole as the number of trials without feedback increases (β_Trial = 0.18, t(23) = 3.4, p = 0.0024). Contrary to previous experiments, this decalibration of force production cannot be interpreted as a regression to mean target; it may rather represent a natural tendency to exert too much force, which must be tempered by training. In contrast to force, there was no effect of trial number on direction accuracy (GLM, β_Trial = 0.03, t(23) = 0.54, p = 0.59).

Subjective estimates

As in previous experiments, there was a trend for the reward bias to be reflected in subjective estimates of distance, but this trend was not significant (β_Incentive = 0.05; t(23) = 1.6, p = 0.12). Participants also tended to underestimate the overshoot, but perceived distance was not significantly shorter than the actual distance reached by the ball (7.8 ± 0.15 vs. 7.9 ± 0.18, t = −0.47, p = 0.64). However, participants significantly underestimated the deviation from target direction, as shown by angle estimates (1.97 ± 0.10 vs. 2.95 ± 0.16°; t(23) = −11.2; p < 0.0001).

Control of variability

Contrary to previous experiments, we observed a significant decrease in force variability for higher incentive level (t(23) = −2.58, p = 0.017). This suggests that motivation helped participants to exert better control on their force, without correcting for their reward bias. There was no effect of incentive level, however, on angle variability (t(23) = −1.41, p = 0.17).

In this result section, we have provided statistical evidence in favour of the reward bias separately for each experiment. Note that each of the three independent replications would hold if we applied a correction for multiple comparisons to the significance threshold. Our analysis is more conservative than pooling the experiments and calculating a global reward bias. Indeed, the global p-value of the β_Incentive estimates over the three main experiments (n = 74 participants) was p = 2.91*10⁻¹⁰.

Supplementary experiments

We provide as supplementary material some key information about the experimental design and results of the two experiments in which we first observed the reward bias. These experiments were performed for other purposes, as explained in corresponding papers, before the preceding main three experiments. However, they both assessed the reward bias, because they involved producing low forces for varying incentive levels. Thus, they provide additional evidence in other groups of participants (n = 80 in total), using different versions of the experimental design. They bring valuable information about the generality of the reward bias because 1) they use a single (low force) target, as in the golf experiment, thus avoiding any possibility of confusion between targets, 2) they include an additional factor with potential loss (in case of poor performance), which did not exert any bias on force production and 3) they implemented a binary payoff procedure (gain/loss depending on hitting/missing target) instead of the proportional rule (the closer to the target, the higher the payoff). More details can be found in supplementary figure legends.

Discussion

Our results provide evidence for a reward bias, defined as the production of stronger force for higher incentive, even when this response is not adaptive. We initially observed this phenomenon in two studies tackling other issues³² (Bioud et al., unpublished data). We replicated this finding in three other studies, first with handgrip force precision and then with golf putting. Critically, the reward bias was detrimental in situations where small to moderate force had to be precisely exerted, because it aggravated the overshoot and thus increased the distance from target.

We believe this phenomenon is different from what has been subsumed under the ‘choking under pressure’ umbrella and traditionally explained by either distraction or explicit monitoring theories^11,12,14. The effects were quite specific in our studies: the impact of incentives was restricted to action vigour, without affecting the timing or direction of movements. By timing we mean adjusting to target time the moment of reaching force peak. This can be done independently of action vigour, as can be the direction of movement to target location in 3D space. We note that the global null result on timing seems inconsistent with the intuitive notion that rewards speed behaviour, as we occasionally observed (with fast targets). The discrepancy likely comes from the fact that, in our design, reward was conditioned on timing precision, thereby penalizing both early and late responses. Also, the variability of performance was not aggravated by higher incentives, as either the relaxation of attentional control or trembling due to increased muscular tension would predict. Finally, we observed a linear relationship between peak force and incentive level, which does not match the theoretical inverted U-shaped function linking performance to arousal^7,33. Importantly, when we included a quadratic function of incentive level in the model used to fit the data, the linear term remained significant, but the quadratic term was not. This confirms that force production was a linear function of incentive values. Note that we used ordinal values in our linear model. With cardinal values the function would become concave, thus contradicting the hyper-arousal theory, which would predict a convex function if choking dramatically exploded with increasing incentives. It is unlikely that we solely sampled the right half of a hidden U-shaped function, because our low incentive level (5 cents) was already quite negligible. Of course, our observations do not invalidate the other theories mentioned above. Rather, we see the reward bias as yet another pathway for pressure effects, which are likely to reflect a collection of various cognitive processes.

Our reward bias is better described as the activation of an automatic link between expected reward and action vigour. Indeed, participants noted that their force increased with incentive level but did not correct for this bias, as if it had been irrepressibly activated. This automatic impulse is therefore different from the subconscious motivation of effort exertion that has been obtained using subliminal presentation of potential rewards^{30,34,35,36,37}. It remains possible that, even if participants reported having produced more force in high-incentive trials, they did not fully realize the systematic mapping from incentive level to force production. In the terminology of behavioural ecology^23,24,25, the reward bias would thus be interpreted as the intervention of a Pavlovian controller, which would distort the action driven by the other controllers – the habit and goal-directed systems. This is at odds with the explicit monitoring theory, which could be interpreted in this framework as the intervention of a goal-directed controller, to the detriment of a better-suited habit-based performance. However, relative to habit-based, goal-directed control would only be detrimental in skilled participants, such as professional golfers in our last experiment.

At the neural level, the reward bias may correspond to the activation of the mesolimbic dopaminergic pathway including midbrain dopaminergic nuclei and the ventral striato-pallidal complex. Neuroimaging studies have implicated this pathway in mediating the effects of incentives on mental and physical effort, in situations where effort exertion is instrumental^30,38. The link between incentive and effort is impaired by the disruption of this pathway, through lesions of the striato-pallidum complex³¹, and improved by the use of dopaminergic medication^29,39. The ventral striatum has also been implicated in Pavlovian-to-Instrumental Transfer^40,41, which supports its participation in both automatic and deliberate responses to potential rewards. The involvement of the mesolimbic dopaminergic pathway is also supported by a neuroimaging study relating ventral midbrain activity to performance decrement in a task where participants had to catch a prey in a computerized maze⁹. Moreover, this link was modulated by individual measures of motivation for money, which fits with the general idea that pressure effects arise from excessive motivation triggered by monetary incentives.

Another neuroimaging study⁸ found, on the contrary, that detrimental pressure effects on a motor control task were associated with a decrease in ventral striatum activity for the highest incentives. This association was modulated by individual measures of loss aversion, which leads to the interpretation that performance impairment was induced by the fear of losing potential rewards. This sort of interpretation would be compatible with distraction theory, although the link between being distracted by loss prospect and failing to control hand trajectory remains to be specified. We believe that this interpretation is less likely with regard to the reward bias, for two reasons. First, fear of losing is more salient in tasks where the outcome is binary (as when one has to hit a target), whereas the reward bias was observed here with both binary and proportional outcomes. Second, the positive link between incentive and force level was stronger when incentives were presented as potential gains versus potential losses (in supplementary studies). This dissociation distinguishes our pressure effect from ‘hyper-arousal’ interpretations, which would equally concern both the gain and loss domains, and justifies the use of the ‘reward bias’ label. The reward bias may reflect a natural pathway that makes invigoration of behaviour easier to trigger from potential reward than from potential punishment^42,43.

Qualifying the reward bias as a Pavlovian response raises the question of why it has been naturally selected, or in other words, how it could enhance adaptive fitness. At first glance, the reward bias seems maladaptive because it may result in exerting more effort for obtaining less reward. Yet our results provide a potential explanation: this response is adapted to high target forces. It could be that a natural correlation exists between reward and effort magnitudes, meaning that on average, larger rewards require more effort to obtain them. The reward bias may thus reflect a prior that the brain could have formed through sampling, given these statistical contingencies. Another possibility is that the negative consequences of overshooting are negligible compared to those of undershooting. To take trivial examples: when chasing a prey, there is no harm in running too fast, or throwing a projectile with too much force (provided the direction is accurate). Thus, exerting too much force may be a mechanism that increases the probability of success, or reduces the delay of reward delivery, when there is some uncertainty in the action-outcome mapping. It implies that the benefit of increasing success likelihood would overcome the cost of exerting more effort. This would be an explanation not only for the reward bias (it is even more important to secure the outcome when it is more rewarding), but also for the motor bias (the mere fact of overshooting).

Indeed, beyond the reward bias, we also observed a motor bias: irrespective of incentive level, participants overshot small to moderate target forces. It is surprising that participants did not correct for this bias and ended up missing a substantial portion of potential rewards. A first explanation is that participants progressively forgot about the different individual targets, which we call the decalibration effect, and regressed to the mean of the target range. However, this would not explain the motor bias that we observed in experiments where there was only one target (golf putting and supplementary studies). A variant of this explanation is that participants were driven to overshoot low targets, and undershoot high targets, because we presented the target at the middle of the screen. This is unlikely because again, the motor bias was also observed when using a single target force, precluding the possibility of any confusion between targets. A second explanation is that participants regressed to an idiosyncratic default force, which would be, on average, around the medium target level (half their maximal force). One reason for having such a default could be that moderate forces are more frequent in everyday life and therefore set priors on force production. It is also possible that moderate forces are in fact less costly, either because exerting small forces require more control or because they require co-contractions and therefore the activation of more muscular fibers^44,45. To test this idea we asked participants which target they would choose for an extra-session of the experiment. A majority favoured the low target, ruling out the idea that medium targets represent low-cost default forces. Even if they required less effort, it is surprising that participants preferred small targets, for which their performance was poorer on average, compared to medium targets. This result hints to a simpler explanation: participants were not aware of their motor bias, as indicated by their subjective estimates. The minimization of the motor bias in subjective report could arise from the desire to be on target, and thus relate to another cognitive bias such as overconfidence, wishful thinking or social conformism⁴⁶. It could also be favoured by a difficulty in sensing the force through proprioception, in the absence of visual feedback. The implicit motor bias observed here is reminiscent of the finding during tit-for-tat exchanges that produced force is under-estimated compared to received force, which might account for force escalation in social conflict⁴⁷.

Although obtained in a specific motor control task, these results may generalize to everyday-life situations, in sport and beyond, and to other types of pressure. For instance, the phenomenon of social facilitation⁴⁸, meaning the improvement of performance induced by the presence of an audience, may be restricted to action vigour. Indeed, the presence of an audience was shown to increase grip force⁴⁹, the effect being instrumental in this case, to counteract force decay. Previous studies showed that competitive contexts with an audience deteriorated the performance of pianists, on both the technical and artistic levels, because they stroked the keys with excessive force^50,51. One could speculate that higher stakes may also undesirably boost the vigour of critical gestures made by professionals such as surgeons or dentists. However, several limitations may reduce the generalizability of our findings. First, the reward bias was observed in the absence of visual feedback – it would be easily correctable if visual information was available to adjust motor commands. Second, it affected a motor skill that was newly learned – it is possible that more habitual movements (requiring less motor learning), or the same movements performed by well-trained professionals such as golf champions, would remain immune to this reward bias (despite anecdotes claiming the opposite). Third, it may be dependent on individual traits such as impulsivity, which can be interpreted as a boost in action vigour with the aim of an earlier reward, or anxiety, which was shown to modulate the relationship between performance and potential outcomes¹³. Thus, the impact of our cognitive bias in other pressure effects seen in sport, academic and business settings remains to be explored.

In conclusion, we provide evidence for an automatic cognitive process that adjusts action vigour to potential reward, inducing performance decrements in motor control tasks. This reward bias does not impair control precision per se, as measured by the variability of motor outputs, but aggravates the systematic tendency to overshoot low targets. Numerous previous studies have reported qualitatively similar, but quantitatively more pronounced, effects of incentives in settings where action vigour was instrumental in maximizing reward (e.g.^29,30,31). Taken together, these results suggest that incentive motivation is beneficial in situations when reward increases with action vigour (such as running and weightlifting), and detrimental when reward increases with action precision (such as golf putting or basketball free throws).