Interactive effects of incentive value and valence on the performance of discrete action sequences

Incentives can be used to increase motivation, leading to better learning and performance on skilled motor tasks. Prior work has shown that monetary punishments enhance on-line performance while equivalent monetary rewards enhance off-line skill retention. However, a large body of literature on loss aversion has shown that losses are treated as larger than equivalent gains. The divergence between the effects of punishments and reward on motor learning could be due to perceived differences in incentive value rather than valence per se. We test this hypothesis by manipulating incentive value and valence while participants trained to perform motor sequences. Consistent with our hypothesis, we found that large reward enhanced on-line performance but impaired the ability to retain the level of performance achieved during training. However, we also found that on-line performance was better with reward than punishment and that the effect of increasing incentive value was more linear with reward (small, medium, large) while the effect of value was more binary with punishment (large vs not large). These results suggest that there are differential effects of punishment and reward on motor learning and that these effects of valence are unlikely to be driven by differences in the subjective magnitude of gains and losses.

When people are highly motivated, they perform with greater speed and accuracy. Motivation often arises from cues in the environment that signal the prospect of a performance-contingent reward or punishment. Accordingly, the scientific study of motivation in humans often examines the effects of performance-contingent monetary gains and losses on behavior and brain activity 1 . Motivation induced by monetary incentives influences the learning 2 , performance 3 , and long-term retention 4

of skilled actions in humans.
The effects of motivation on skilled action depend on incentive size 5,6 and incentive valence 7 . Compared to training with reward, some studies have found that training in a serial response time task with punishing feedback enhances performance during training but impairs later skill retention 2,4,8 . One possibility is that these observed differences between punishment and reward are driven by differences in the subjective magnitude of gains and losses. People often treat gains as having greater (dis)utility than equivalent losses in economic choice tasks, a phenomenon known as loss aversion 9 . If prior valence effects are driven by loss aversion, then the effects of valence (punishment -reward) should be in the same direction as effects of increasing value (e.g., $30-$5). In addition, the effects of increasing incentive value should be greater for punishment than reward. No study to our knowledge has examined interacting effects of the valence and value of incentives in the context of training skilled actions. So, it remains a possibility that valence effects are really just value effects.
To test this hypothesis, we conduct a study in which we manipulate both incentive value and incentive valence while participants learned motor sequences. In total, 93 human participants trained to perform motor sequences for rewards of varying size (Reward group), punishments of varying size (Punishment group), or no incentives (Control group). Consistent with our hypothesis, we found that large incentives enhanced on-line performance but impaired the ability to retain the level of performance achieved during training. However, we also found that on-line performance was better with reward than punishment and that the effect of increasing incentive value was more linear with reward (small, medium, large) while the effect of value was more binary with punishment (large vs not large). These results suggest that differential effects of punishment and reward on motor learning are not driven by mere differences in the subjective magnitude of gains and losses. Participants performed a discrete sequence production task with opportunities for monetary gains (reward) or losses (punishment). Each participant learned three unique 8-item sequences and each sequence was paired with an incentive of $5, $10, or $30. The control group performed the same task but without incentives. Each trial began with a cue signaling which sequence to perform (for all groups) and the size of the associated incentive (for incentive groups). Participants were instructed to perform the sequences as quickly and accurately as they could. Participants returned to the lab a day after training to test how well they retained their level of performance when incentives were no longer present.  3C). In this case though, the decrease in accuracy over time is likely because our time limits created an imperative to strive for increased speed throughout training. However, the negative effect of block was less pronounced for higher value sequences, suggesting participants could maintain higher levels of accuracy on these sequences (block*value: β = 0.41, CI = [0.14, 0.66], P d = .001 ). Critically, we found a positive main effect of value on accuracy ( β = 0.30, CI = [0.14, 0.46], P d < .001 ), demonstrating that accuracy improved with increasing value and therefore that incentive-motivated enhancements in movement speed were not accompanied by tradeoffs in movement accuracy. When comparing reward and punishment, we also found evidence for a positive interaction between value and valence ( β = 0.28, CI = [−0.04, 0.61], P d = .05 ), indicating that the value-driven enhancement of movement accuracy was more pronounced for Reward compared to Punishment. This was confirmed by fitting the model separately to data from each group, which showed that the value effect was greater for the Reward group ( β = 0.44, CI = [0.23, 0.64], P d < .001 ), compared to the Punishment group ( β = 0.16, CI = [−0.11, 0.43], P d = .12 ). Lastly, a model-comparison analysis demonstrated that for Punishment data, a model with a binary value function predicted held-out data better than a model with a linear value function (binary -linear: M = −10.4, SE = 6.57 ) (Fig. 3A). However, for Reward data, a linear value function led to better predictive accuracy (linear -binary: M = −12.6, SE = 6.93 ). Consistent with our speed results, effects of increasing incentive value on movement accuracy were different for Reward and Punishment. Value-driven impairment of performance retention. The results above show that incentives influenced performance during training. However, it remains to be seen whether these training incentives have a lasting influence on performance that persist when incentives are no longer available. We addressed this possibility by having participants return one day after training to perform the same sequences again without incentives. We www.nature.com/scientificreports/ applied the same regression models as above to response initiation, movement speed, and movement accuracy data at test. The effect of value from training on response initiation also persisted to the test session, since there was a negative main effect of value on RT β = −0.23, CI = [−0.39, −0.08], P d = .003 ) (Fig. 5A). Response initiation improved over the course of the test session, as evidenced by a negative linear effect of test block on RT ( β = −0.11CI = [−0.22, 0.01], P d = .03 ). We did not find evidence for other main effects or interactions in the model (all P d > 0.1).
The effect of value from training on movement speed persisted to the test session, since there was a positive main effect of value on speed ( β = 0.14, CI = [0.03, 0.25], P d = .02 ) (Fig. 5B). Movement speed also improved over the course of the test session, as evidenced by a positive linear effect of test block on speed ( β = 0.27CI = [0.19, 0.34], P d < .001 ). We did not find evidence for other main effects or interactions in the model (all P d > 0.1).
While there was no main effect of prior value on movement accuracy during the test session ( β = −0.02, CI = [−0.18, 0.14], P d = .40 ), we did find some evidence of a small negative main effect of valence ( β = −0.16, CI = [−0.35, 0.04], P d = .06 ) (Fig. 5C). This appeared to be driven by especially low accuracy on sequences trained with $5 reward, as evidenced by a negative value by valence interaction ( β = −0.26, CI = [−0.57, 0.04], P d = .05 ). Movement accuracy was relatively constant throughout the test session, as there was no effect of block ( β = −0.04CI = [−0.15, 0.07], P d = .22 ). We did not find evidence for other main effects or interactions in the model (all P > 0.1). Overall, it seems that the beneficial effects of value on accuracy did not persist into the test session and that training with small punishment may be more favorable for future movement accuracy than training with small reward.
The results above show that many of the beneficial effects of incentives on performance persisted to a test session the next day. But do incentives also protect against performance degradation over time? We address this question by comparing motor performance (movement speed, kps) at the end of training performance Overall, it appears that retention worsened with increasing incentive value, where again this incentive value effect was greater for reward than punishment, such that retention was worst for sequences paired with large rewards. However, it is unclear if this performance degradation is due to a loss of skill per se or rather a loss of motivation given that we did not measure performance on random sequences paired with incentives (see discussion).

Discussion
Training to perform motor sequences with incentives led to enhancements of movement initiation, speed, and accuracy for sequences paired with large reward and these value effects persisted into the following day when participants were tested without incentives. Participants were unable to retain the level of performance they attained in training and retention was worst for motor sequences paired with high value reward. However, effects of increasing incentive value on motor on-line performance improvements were distinct for reward and punishment. Responses to punishment values showed a more binarized, categorical profile (large vs not) and responses to reward values showed a more linear profile. These results add nuance to prior work showing a dissociation between the effects of reward and punishment on motor skill learning. We found that actions trained in association with large prospective incentives (+ /-$30) were performed and acquired more rapidly compared to sequences trained with small incentives (+ /-$5). A plausible explanation for this finding is that increasing incentive size increases the expected utility of success, thereby "paying the costs" associated with exerting more task-positive effort [10][11][12] . On this view, larger incentives make increased www.nature.com/scientificreports/ effort more worthwhile, and increased effort improves the performance and acquisition of skilled actions. When people are offered large reward, they are more willing to expend mental effort [13][14][15] and physical effort 14,16,17 and when people exert more effort, their performance and learning improve [18][19][20][21] . But sometimes conditions which enhance learning actually impair long-term retention 22 . Consistent with this we found that the performance levels for actions paired with larger reward were not retained as well as the performance levels for actions paired with smaller rewards. We found that actions trained in association with rewards (i.e., monetary gains) were acquired better than actions trained with punishments (i.e., monetary losses). Some studies have examined the differential effects of reward and punishment on skill learning and the results have been mixed. Training with reward feedback has been associated with improved learning in a serial reaction time (SRT) task 2 , improve long-term memory retention in a force-tracking task 23 , and slower forgetting in a visuomotor adaptation task 24 . On the other hand, it has been shown that punishment leads to faster overall movement times during training in the SRT task 8 and that training with punishment leads to faster on-line learning in a visuomotor adaptation task 24 . It appears that the effects of punishment and reward on skill learning depend on which task is being performed. As these tasks differ on the precise perceptual, cognitive, and motor demands placed on participants, it would be valuable to carefully determine how each of these processes is impacted by motivation.
Some previously observed effects of incentive valence on motor skill learning could have been due to loss aversion: perhaps punishment and reward had different effects because losses had greater subjective (dis)utility than gains 9 . Consistent with this view, we found that the effect of increasing incentive value resembled the effects of punishment (compared to reward) reported in prior work-namely, enhanced on-line performance and impaired retention. We found evidence that $30 reward led to the best on-line performance but the worst retention, compared to punishments or smaller rewards. However, we did not find that punishment enhanced training performance over reward. If anything, our results suggest that the effects of increasing incentive value were more pronounced for reward than punishment. Thus, it is unlikely that loss aversion can explain the differential effects of punishment and reward on motor learning.
The differential effects of punishment and reward on skill learning could arise from differences in the way punishment and reward are processed by the brain that are unrelated to loss aversion. For example, we found evidence that, in terms of their effects on behavior, rewards were encoded linearly while punishments were encoded in a binary manner. This encoding difference could be grounded in one of the many differential effects of punishment and reward on brain areas (e.g., orbital frontal cortex 1 , dorsal striatum 25 , ventral striatum 2 , ventral tegmental area 26 , hypothalamus 27 , medial prefrontal cortex 28 , insula 2 , and amygdala 27 ) or pathways (e.g., dopaminergic 26 , GABAergic 26 and serotonergic 29 ). Future work should explore whether differences in the underlying neural mechanisms of reward and punishment processing can account for the differential effects of reward and punishment value on skill learning.
Consistent with prior work, we found that training with incentives, particularly large incentives, enhanced the on-line performance of action sequences compared to training with no incentives 2,8 . However, we found that training with incentives, particularly large rewards, was associated with a greater inability to maintain performance from training to test. This is consistent with the fact that conditions which foster rapid skill acquisition can impair long-term skill retention 22 . Studies have demonstrated several desirability difficulties 30 , such as varying practice conditions 31 , spacing study sessions 32 , and interleaving instruction 33 . Relatedly, it is possible that our control group benefitted from encoding specificity 34 or transfer-appropriate processing 35,36 , whereas these beneficial context effects would be reduced for our incentive groups since there were no incentives in the test session. Furthermore, training without incentives may have encouraged intrinsic motivation, while training with incentives discouraged it, a phenomenon known as the undermining effect [37][38][39] . The undermining effect may have led participants in the incentive groups to be less motivated in the retention test session compared to participants in the non-incentive group. Future work would be required to disentangle these alternative accounts.
Two findings in the present study contradict the results of prior work and deserve extra attention. The first is that participants in the reward group improved more quickly during training than participants in the punishment group, whereas prior studies found that punishment enhanced online performance relative to reward 8,40 . However, a salient methodological difference between our study and prior studies is that we administered rewards and punishments prospectively with pre-trial cues, while prior work administered incentives retrospectively with feedback. Prospective and retrospective incentives may differentially engage error-based learning mechanisms 41 . For example, prospective incentive cues may bias learning to occur at the level of motor planning and action selection, while retrospective cues may bias learning to occur at the level of movement execution. While prior work has administered reward prospectively and found robust effects on the performance of learned skills, to our knowledge no prior study has examined the effects of prospectively administered punishment 3,42 . An interesting avenue for future research would be to directly compare the effects of prospective and retrospective incentives on skill learning and retention.
The second contradictory finding is that sequences paired with large reward showed larger performance decrements from training to retention test compared to sequences paired with no reward, whereas prior studies found that reward enhanced retention relative to control 2,4,40 . While our findings contradict the findings of the particular studies cited above, other studies have reported similar failures to replicate 8,43 . In fact, we expected the effects of large rewards to resemble the effects of punishment reported in prior work-namely, enhanced online performance and impaired off-line retention. The basis of this prediction is the hypothesis that the valence effects are driven by value differences vis a vis loss aversion. However, we note that our results directly comparing punishment to reward do not necessarily support this loss aversion hypothesis. It remains unclear what factors explain whether the prospect of reward enhances or impairs retention. Future studies could shed light on this puzzle by more precisely operationalizing skill retention so as to avoid potential confounds, as we discuss below. www.nature.com/scientificreports/ In the present study, we operationalize learning as the improvement in performance with practice and we operationalize retention as the change in performance from the end of training to a short test the next day. We are limited by our experimental design in the specific conclusions we can draw about the memory representations involved in learning and retention. For example, it is unclear whether the changes in performance over time observed here are related to sequence-specific skill (e.g., 1-2-4-3-2-4-3-1), or a more general task skill (e.g., typing), or changes in motivation or arousal 44,45 . We think it is likely that all of these aspects are involved, but future studies are required to tease apart their relative importance. While it was not feasible here, given the already large number of conditions, future studies should administer untrained sequences during training and incentivized sequences during test to allow for all the comparisons necessary to adjudicate between these alternative explanations of the behavioral results.
In sum, we provide evidence that on-line motor performance was enhanced for sequences paired with large reward, while off-line performance retention was impaired. Effects of increasing incentive value were different for rewards and punishments, where sequences paired with $30 reward were performed the best during training but were retained the worst at test the next day. Whereas reward effects were linear, punishment effects tended to be binary. These findings add nuance to the growing literature on the effects of punishment and reward on skill learning by showing that incentive valence interacts with incentive value in complex and surprising ways. Ethics. All participants gave written informed consent to participate. Participants were compensated at a rate of ten U.S. dollars per hour plus cash bonuses. This study was approved by the University of Michigan IRB for Health Sciences and Behavioral Sciences and all methods were performed in accordance with the relevant guidelines and regulations. All software and data used in this manuscript will be made publicly available upon publication.

Experiment
A diagram of the experimental protocol is shown in Fig. 1A. Participants completed a familiarization phase consisting of four blocks of 40 trials of the DSP task with random sequences. Next, they completed a training phase consisting of nine blocks of 36 trials of the DSP task with three cued sequences. For the Reward group, each sequence was paired with a $5, $10, or $30 reward; For the Punishment group, each sequence was paired with a -$5, -$10, or -$30 punishment; for the Control group, sequences were not paired with incentives. At the start of the experiment, participants in the Punishment group were endowed with $30 from which their potential losses would be deducted. During the training phase, success was based in part on a time limit that became stricter as the participant's performance improved.
After the training phase, participants went home and did not practice for 24 h. After this day of rest, participants returned to the lab to complete a retention test consisting of three blocks of 40 trials. 30 of these trials were random sequences while 90 were the three sequences they trained on previously (30 trials each). At the end experiments for the incentive groups, one trial was selected at random from the training session. If the sequence was completed, without errors and faster than the time limit, the participant would gain or lose the amount associated with the sequence. Participants were given the option to receive payments at the end of each session, however everyone chose to receive all their payment at once at the end of the test session. Fig. 1B. Trials with cued sequences began with the presentation of a colored square associated with a unique sequence. For the incentive groups, the colored square was offset to the left and the incentive size was presented simultaneously in text to the right of the square (e.g., "$30"). The color cue and reward cue were redundant because there was a 1:1 mapping between sequence and incentive value (each sequence was paired with a unique value). For the control group, there was no incentive information presented and colored squares signifying sequence identity were presented centrally. Familiarization trials with random sequences began with a grey square. The cue period with sequence/reward information lasted for two seconds. Next, four gray squares are presented in a horizontal array and remained on the screen for one second. Then, one square changed color to white, indicating which of four keys the participant was to press with their left-hand (non-dominant) fingers (1: little finger, 2: ring, 3: middle, 4: index). Immediately after the correct key was pressed, the corresponding square changed color to black and a new square changed color to white, indicating the next key to press. This process repeated until the sequence was completed or a key-press error occurred. If a key-press error occurred, the most recently cued square changed color from white to red, and the trial was aborted after a 1 s delay. If the sequence was completed too slowly (i.e., completion took more time than allowed by the current time limit), then the stimuli disappeared and feedback text stating "Too slow" was presented centrally for 1 s. No additional feedback was given at the end of trials in which all eight key presses were completed within the time limit.

Task. A diagram for an example of a successful DSP trial in the training is shown in
During the training session, each participant was constrained by a movement time limit that was dynamically updated to ensure that task difficulty was relatively stable within and between subjects. The time limit was defined as the median movement time on the previous twenty correct trials regardless of sequence. This time limit was used to determine whether the participant responded too slowly and hence whether they would gain or lose money if that trial were randomly selected at the end of the experiment (assuming there were no keypress errors). www.nature.com/scientificreports/ Incentive instructions. Participants in the Reward group were told that they would be playing to win a cash bonus of $5, $10, or $30, which would be displayed at the beginning of each trial. They were told that a colored square would indicate which sequence they must type and that each color would always be worth the same amount of money. They were encouraged to memorize these sequences as this would help them prepare, type faster, and win more bonus money. We clarified that they would not be accumulating money, but instead that at the end of the experiment, a trial would be chosen at random. If they correctly pressed all 8 buttons in time for that chosen trial, they would win the reward that was displayed for that trial; if they get it wrong, they will not win or lose any money. Participants in the Punishment group were similarly told that they would be playing to win a cash bonus. We told them that they would be endowed with a $30 bonus at the outset of the experiment, but that they would have to perform well to avoid losing this money and that the penalty value would be displayed on each trial to tell them how much is at stake on that trial ($5, $10, or $30). They were encouraged to memorize these sequences as this will help them prepare, type faster, and avoid losing their $30 bonus. Finally, we clarified that If they correctly pressed all 8 buttons in time for the randomly chosen trial, they would keep their $30 bonus (plus their hourly wages of $20); if not, they will lose the amount of money that was displayed for that trial by subtracting it from their initial $30 bonus. Other instructions were identical to those of the Reward group.
Data and statistics. Data visualization was performed using the R package ggplot2 46 . Error bars in the plots reflect within-subject standard errors, i.e., standard error of y -participant mean + grand mean 47 . Our plots use Wong's color-scale, which was designed to be accessible to colorblind readers 48 .
Regression models were implemented using the R package brms: Bayesian Regression Models using Stan 49 (v2.14.4; https:// paul-buerk ner. github. io/ brms/). The backend of brms is the probabilistic programming language Stan, which uses Markov Chain Monte Carlo (MCMC) sampling to compute approximate posterior probability distributions for model parameters, such as regression coefficients 50 . We configured the MCMC sampler to run four chains in parallel with 4 k warmup and 6 k post-warmup iterations per chain. We assigned weakly informative default priors to all parameters (e.g., standard normal distributions for main effects and interactions) 51 . For each regression coefficient, we report the median estimate ( β ), the 95% credible interval 52 ( CI ), and the proportion of the posterior with the wrong sign ( P d ), equal to one minus the probability of direction (see this vignette for more on pd and its relation to the frequentist p-value: https:// easys tats. github. io/ bayes testR/ artic les/ proba bility_ of_ direc tion. html) 53 . To illustrate, we would summarize a Normal(1,1) posterior parameter estimate as β = 1, CI = [−0.96, 2.96], P d = . 16 . Posterior summaries were obtained using the R-package bayestestR (v0.7.5; https:// easys tats. github. io/ bayes testR/ index. html).
Our analyses of training data focused on initial reaction time, movement speed, and keypress accuracy. We measured RT as the duration between stimulus onset and the first response in the sequence in milliseconds. We measured speed as the number of keys pressed (8) over movement time in keys per second (where movement time is the duration between the first and last key presses). Our analyses of RT and speed included only trials without keypress errors (i.e., complete sequences). For all three measures, speed, accuracy, and RT, we modeled means for each level of subject by value by valence by block, averaging over trials.
Our models of mean RT and speed used the gaussian response-function with density given by: where σ is the standard deviation of the residuals 54 . Our models of mean accuracy used a beta response-function with density for y ∈ (0, 1) given by: where B is the beta function and φ is a positive precision parameter 54 . Since our accuracy data could also take on values at zero and one, we used a zero-one-inflated version of the beta family with density given by: where α is the probability of zero or one and γ is the probability of one but not zero 54 . All models included fixed effects of value, valence, linear block, quadratic block, linear block by value interaction, value by valence interaction, and linear block by value by valence interaction. Additionally, intercepts, and main effects of block and value were allowed to co-vary across participants around a multivariate population mean (i.e., the model allowed for 'random effects' of participant). Value (5,10,30) and valence (Punishment, Reward) were coded using linear contrasts (-0.5, 0, 0.5) and block was coded using orthogonal 2 nd degree polynomial contrasts. RT and speed were z-scored prior to modeling. Overall, our RT and speed models were specified as: (1) www.nature.com/scientificreports/ and our accuracy model was specified as: In the specifications above, i denotes a particular trial and j denotes a particular subject. X i are the predictors for a particular trial, β j are the regression coefficients for a particular subject, β are the group mean regression coefficients, are the standard deviations of the coefficients across subjects, L represents the correlation of coefficients across subjects, and b j are the random effects of subject on the regression coefficients. Note that this hierarchical parameterization was only applied to the within-subject main effects; for interactions and valence effects, β j = β.
We also examined whether the value-response function was different for the two valence groups. We assessed this by comparing models that differed solely in the contrast used to encode value: linear (-0.5, 0, 0.5) or binary (-0.25, -0.25, 0.5). In this case, the models were estimated and compared separately for each group. We compared models using an approximate leave-one-out cross validation (LOO-CV) score intended to estimate the model's expected log predictive density (ELPD) for out-of-sample data 55  where the p(ỹ i ) 's are unknown quantities which represent the true generative process for ỹ i and which can be approximated using cross-validation. In place of the true theoretical ELPD defined above, we use a leave-onetrial-out Bayesian estimate thereof, defined in Vehtari et al. (2017) as: where is the leave-one-out predictive density given the data sans the ith point. When the absolute mean difference in ELPD between two models exceeds the standard error of the differences (Eq. 23 in Vehtari et al. (2017), then the out-of-sample data are predicted better by the model with lower ELPD. We implement this model comparison analysis using the R package loo (v2.3.1; https:// mc-stan. org/ loo/ index. html).
We fit a separate set of regression models to data from the test session, including RT, speed, and accuracy. These models included the same predictors as listed above. We also directly compared performance at the end of training (last 3 blocks) to performance in the test session (first 3 blocks) using a regression model of mean movement speeds at each level of block by session by value by valence. This regression model differed from those described above only in the inclusion of the session predictor (sesh) and its interactions with the other predictors: sesh*value, sesh*valence, block*sesh, block*sesh*value, block*sesh*valence, block* sesh*value*valence. While we were primarily interested in the effects involving session (e.g., sesh*value*valence), we included other predictors to ensure they are not driving the effects of interest. Intercepts and the main effects of block, session, and value were allowed to co-vary across participants around a multivariate population mean.