Optimal motor decision-making through competition with opponents

Even though optimal decision-making is essential for sports performance and fine motor control, it has been repeatedly confirmed that humans show a strong risk-seeking bias, selecting a risky strategy over an optimal solution1-9. Despite such evidence, how to promote optimal decision-making remains unclear. Here, we propose that interactions with other people can influence motor decision-making and improve risk-seeking bias. We developed a competitive reaching game (a variant of ‘chicken game’) in which aiming for greater rewards increased the risk of no reward and subjects competed for total reward with their opponent. The game resembles situations in sports, such as a penalty kick in soccer, service in tennis or the strike zone in baseball. In three different experiments, we demonstrated that, at the beginning of the competitive game, the subjects robustly switched their risk-seeking strategy to a risk-averse strategy. Following the reversal of the strategy, the subjects achieved optimal decision-making when competing with risk-averse opponents. This optimality was achieved by a non-linear influence of the opponent’s decisions on the subject’s decisions. These results suggest that interactions with others can alter motor decision strategies and that competition with a risk-averse opponent is the key for optimising motor decision-making.


Introduction
Optimal decision-making is indispensable for ideal performance in sports and fine motor control in everyday life. For example, selecting an appropriate trajectory for reaching a glass of water can lead to a low risk of spilling water, and likewise, finding a running course to easily pass through in rugby and deciding the best aiming location in a tennis match can increase the possibility of victory in a competition. Despite the importance of optimal decision-making, for over a decade, sub-optimal and overly risk-seeking behaviours have been reported in various motor decision tasks [1][2][3][4][5][6][7][8][9] . Determining how to improve sub-optimal and risk-seeking decision-making behaviour is crucial to enhance well-being in daily life and performance in sports. However, how motor decision-making can be optimised remains unknown.
A possible solution to achieve optimal decision-making is interactions with other people. It has been shown that the observation of other people's movements induces synchronisation in one's own movement speed during a competitive game 10 , facilitates movement adaptation 11 , and influences the prediction of other people's movement 12 . Since risk-seeking behaviour has been reported in motor tasks in which subjects perform alone, the presence of other people may influence sub-optimal motor decisions.
Here, we investigated how humans alter their motor decision-making in a competitive game (a variant of 'chicken game'), which requires naturalistic interactions with other people.
We had two main hypotheses. First, if the decision system simply imitates an opponent's movement, then a linear relationship between the subject's and opponent's decisions should be observed. If this is correct, optimal decisions should be achieved when the opponent's decisions are also optimal. This hypothesis is based on evidence that the unintended imitation of movement speed or distance occurs in a competitive situation 10 . Second, if the decision system adaptively changes the motor plan based on the opponent's movements, then a non-linear relationship should be observed. If this hypothesis is true, optimal decisions should be achieved when the opponent's decisions are sub-optimal.
To test these hypotheses, we assessed subjects' behaviour during competition with a virtual opponent who behaved either optimally or sub-optimally. First, we show that the direction of the sub-optimality of motor decisions is reversed from risk-seeking to risk-averse in the beginning of the competitive situation. Second, following this reversal of sub-optimality, we demonstrate that competition with sub-optimal risk-averse opponents promotes optimal decision-making. Finally, to explain these findings, we confirm that the subjects' decisions are affected by opponents' decisions in a non-linear function.

Results
Subjects performed a quick out-and-back reaching task (moving forward from the start position and returning to the start) using a pen-tablet ( Fig. 1a,b). A cursor corresponding to the position of a digitised pen was presented on a vertical screen. The endpoint of each movement was defined as the maximum y-position (Fig. 1b), and the subjects were rewarded depending on the endpoint following an asymmetric gain function in each trial (Fig. 1c). The subjects scored more points if the endpoint was located closer to the green boundary line (set 30 cm forward from the start position); however, if the boundary line was crossed, the score was set to 0. The nature of this game resembles several situations in sports, such as a penalty kick in soccer, service in tennis or the strike zone in baseball. The subjects could aim at any point on the screen. For the selected aim point, the actual endpoint was probabilistic due to the inherent noise of the motor system. Therefore, the subjects were required a continuous motor decisions regarding where to aim considering this inherent motor noise.
Two tasks were used: an individual task requiring the subjects to maximize the total score within each block (10 trials/block) and a competitive task requiring them to take a higher total score than their opponents within each block. In Experiment 1, the subjects were randomly divided into three groups: risk-neutral, risk-averse and individual groups. In the risk-neutral group, the subjects (N = 9) competed against virtual opponents whose aim points were set to the optimal aim point (see Design and Procedure). The optimal aim point was calculated by maximising the expected gain based on each subject's endpoint variability over the past 40 trials 5 before starting each block of the competitive task (see Model assumption). Because the subjects' endpoint variability decreased with the progression of the block, risk-neutral opponents' aim points gradually increased (red line in Fig. 4a). In the risk-averse group, the subjects (N = 8) competed against the opponents who gradually changed their aim point from optimal to sub-optimal and risk-averse (red line in Fig. 4b). The opponents' actual endpoint varied from trial to trial and followed a Gaussian distribution.
Three experimental sessions were performed. The subjects in the risk-neutral and risk-averse groups performed 5 blocks of the individual task (baseline), 12 blocks of the competitive task (competition) and 5 blocks of the individual task (washout). In the individual group, the subjects (N = 10) performed the individual task for 22 blocks. Based on Bayesian decision theory 13-15 , we determined each subject's risk-sensitivity in the individual (baseline) and competitive task as the deviation of the actual aim point (observed mean endpoint) from the optimal aim point (see, Model assumptions). If the actual aim point was larger than the optimal aim point (i.e. a positive value), it indicated that the subject adopted a sub-optimal, risk-seeking strategy (seeking a high one-trial reward with a high probability of failure). In contrast, if the actual aim point was smaller than the optimal aim point (i.e. a negative value), it indicated a sub-optimal, risk-averse strategy (seeking a low one-trial reward avoiding high probability of failure). If risk-sensitivity was close to 0, the subject was considered to have made optimal, risk-neutral decisions. Figure 2a and b illustrate the time series of the reaching endpoint from the baseline to competition. A comparison of the actual and optimal aim points revealed that the subjects adopted a risk-seeking strategy at the baseline (Fig. 2a',b'); however, they shifted their strategy to be risk-averse by decreasing the reaching endpoint from the baseline at the beginning of the competition (Fig. 2a,a',b,b'). This shift in strategy-a significant decrease in endpoint from 6 mean endpoint (actual aim point) at the baseline-was found in the first several trials of the competitive task (Fig. 2a,a' was seen neither when the subjects continued the individual task (Fig. 2c,c') nor during the period from competition to washout ( Supplementary Fig. 1a,a',b,b'). Although the risk-seeking strategy was robust even after repetitive practice for 9 days 6 , our results indicated that it could be switched to a risk-averse strategy when competing with a novel opponent. In other words, interactions with other people significantly altered motor decision-making.
We further validated this reversal of risk-sensitivity using additional experiments.
Since human motor control is influenced by both intrinsic uncertainty of the motor system 16 and the extrinsic uncertainty of the environment 17,18,19 , we attempted to attenuate these uncertainties before starting the competition. The subjects in the individual group from Experiment 1 performed the competitive task with risk-neutral opponents after completing 22 blocks of the individual task. This practice reduced intrinsic uncertainty: the SD of the reaching endpoint for the last five blocks was 0.73 times lower than that for the first five blocks. In Experiment 2, a new group of 11 subjects was recruited (presentation group). To attenuate the extrinsic uncertainty of the opponents' behaviour, the subjects were shown the movement of the risk-neutral opponents for 10 trials prior to starting the competitive task. Although the subjects improved reaching accuracy or knowledge of their opponents in advance, the reversal of risk-sensitivity (from risk-seeking to risk-averse) occurred at the beginning of the competitive task ( Fig. 2d, findings suggest a clear tendency to abandon an original risk-seeking strategy and start a competition in a conservative manner even when the intrinsic and extrinsic uncertainties are attenuated or when competing against a human opponent.
Following the reversal of risk-sensitivity at the onset of the competitive task, we investigated the influence of the opponents' decision-making on the subjects' risk-sensitivity.
Again, the risk-seeking strategy (positive value of risk-sensitivity) was adopted at the baseline in the three groups (Fig. 3), which remained the same in the individual group (Fig. 3 Fig. 3; ps < 0.05, Bonferroni correction). No significant differences were observed among the three groups in terms of the SD of the reaching endpoint or optimal aim point ( Supplementary   Fig. 4a,b). These results indicated that the sub-optimal risk-seeking strategy was modified by the presence of the opponent. Specifically, the optimal risk-neutral strategy was promoted by competition with a sub-optimal risk-averse opponent.
Next, we addressed why the competition with sub-optimal risk-averse opponents led to optimal and risk-neutral decision-making. To specify the relationship between the opponents' and subjects' decisions, we calculated the following indices as measures of motor decision-making: , mean endpoint across five blocks in the individual task, , mean endpoint across each block in the competitive task and , opponents' mean endpoint across each block in the competitive task. The subjects in the risk-neutral group gradually increased their aim point as the opponents' aim point increased ( Fig. 4a; r = 0.82). In contrast, there was no such correlation in the risk-averse group, and the subjects maintained their aim point even though the opponents' aim point decreased (Fig. 4b). When repeating the individual task, a decision-making-competition with sub-optimal, risk-averse opponents led to optimal decision-making-.

Discussion
For over a decade, sub-optimal and risk-seeking behaviours have been repeatedly confirmed in studies of motor decision-making tasks with an asymmetric gain function 4,5,6,8 , that require a choice with different variances of pay-off 2,3,7,9 and involve a speed-accuracy trade-off 10 we assessed the potential effect of interaction with opponents on sub-optimal motor decision-making with a prediction that other people's actions/intentions can influence the subjects' motor system 10,11,20,21 . First, we found that the subjects' risk-seeking strategy in the individual task reversed to risk-averse strategy at the very beginning of the competitive task ( Fig. 2). Second, optimal motor decisions were promoted by competition with a risk-averse opponent (Fig. 3). This optimal decision-making was induced by a non-linear influence of the opponents' decisions (Fig. 4).
The reversal of risk-sensitivity was robustly shown through several experiments (Fig.   2). However, this switching of strategy from the individual task is irrational. In the individual task, the subjects were instructed to maximise the total score. At the beginning of the competitive task, when the subjects did not know the opponent's strategy, they should have maintained their original strategy to maximise the total score and beat the opponent. The data showed large decrease in endpoint in the first trial, and it recovered thereafter (Fig. 2). A possible explanation for this behaviour is that the subjects sought a better strategy believing that they would compete against a weak opponent who aimed for a lower score. If the subjects believed that the opponent was strong and would aim for a higher score, they would have never changed their original strategy. Therefore, this amount of decrease reflects the subjects' risk-premium 22 that they would recover the points in later trials even if they scored fewer points at the beginning. By scarifying the cost of scoring fewer points, the subjects may be seeking an optimal strategy to beat a weak opponent.
We also found that the subjects' motor decisions were non-linearly influenced by the opponents' decisions (Fig. 4). The subjects increased their aim point when the opponents also aimed for a higher score (Fig. 4a). In contrast, when the opponents aimed for a lower score, the subjects did not change their aim point (Fig. 4b). Therefore, the subjects adaptively altered their decisions according to the opponents' decision, rather than imitating it. If the opponents' decision linearly affected the subjects' decision and imitation occurred, the subjects would have also aimed for a lower score. The decision strategy that the subjects adopted can be interpreted as a variant of the win-stay lose-shift strategy 23 . Importantly, in terms of the win-stay part (Fig.   4b,d), the subjects decreased their aim point from the individual task and then let the strategy 'stay', rather than adopt the original risk-seeking strategy and then let the risk-seeking strategy 'stay'. These inhibitory and non-linear effects of the opponents' decision induced the subjects to make optimal and risk-neutral decisions. Further research is necessary to clarify how the opponent was modelled in the decision system to generate an optimal motor plan.
Strategic decision-making has been investigated in game theory tasks that require players to make discrete choices 24 . In the Prisoner's Dilemma game 25 -a standard game theory task-two prisoners have two choices, cooperation or defection, which determine four possible pay-offs (prison sentences). In the current study, however, the subjects decided where to aim to beat their opponent. Such continuous choice (motor decision-making) is often required in competitive sports (soccer, tennis, baseball, golf, darts, etc.). The current study, therefore, highlighted the characteristics of movement strategy in competitive situations. Specifically, we clarified how interaction with opponents improved sub-optimal motor decision-making. When humans practice a motor task alone (without opponents), repetitive practice has been shown to improve movement accuracy but not movement strategy 6 . Our findings suggest that competition with an opponent, particularly a risk-averse opponent, is an effective means to promote an optimal and risk-neutral movement strategy. In behavioural economics, Richard Thaler (Nobel economics winner in 2017) proposed 'nudge' as a means of behavioural change since human decision-making is systematically biased under bounded rationality (nudge refers to the choice architecture which guides people's choice toward a beneficial one while maintaining freedom of choice) 26 . The presence of other people can be interpreted as one of the nudges which alter sub-optimal motor choice. This information may be helpful for sports trainers and coaches to achieve a better motor performance-the importance of other people should be considered in developing a training protocol in sports.

Apparatus
We used a pen-tablet with sufficient workspace to measure the subjects' arm-reach movement (Wacom, Intuos 4 Extra Large; workspace:, 488 × 305 mm). The subjects made a quick out-and-back reaching movement holding the digitised pen on the pen-tablet (Fig. 1a).
The position of the digitised pen was sampled at ~144 Hz with a spatial resolution of 0.01 mm.
The subjects manipulated a cursor on a vertical screen whose position was transformed from the pen position with a maximum delay of 69 ms(Asus, VG-248QE; size: 24 inches, refresh rate: 13 ~144 Hz). The scale of the pen and cursor position was 1:1. All stimuli were controlled using Psychophysics Toolbox 27,28 .

Experimental task
There were three tasks: training task, individual task and competitive task.  (Fig. 1b). If the subjects did not return the cursor within 600 ms (time-out), a message stating 'Time-out. More quickly!' was presented with a warning tone. If the subjects successfully completed the trial, a yellow cursor (radius: 0.3 cm) appeared at the position of the reaching endpoint for 2 s. After the feedback period, the subjects proceeded to the next trial.

Training task
Before the individual task, a training task was performed to allow the subjects to practise the reaching movement. The subjects were required to reach the green boundary line.
After each movement, if the yellow cursor overlapped with the green boundary line, the message 'Hit!' appeared on the screen with a pleasant sound. The training task comprised 50 trials.

Individual task
In the individual task, the subjects were awarded points depending on the reaching endpoint (Fig. 1c). More points were awarded when the endpoint was closer to the green boundary line at 30 cm; however, the score for this trial fell to 0 if the endpoint crossed the boundary line. When a mistrial occurred a 'Miss!' message appeared on the screen with a flashing red lamp along with an unpleasant alarm. Of note, 0 points were also awarded if the endpoint was within 7 cm of the start, but no such trials were observed. In case of time-out, 0 points were also awarded. In the feedback period, the current points and total points were presented along with the reaching endpoint. The subjects were instructed to maximise the total points in each experimental block, comprising 10 trials each.

Competitive task
In the competitive task, the subjects performed the task against a computer opponent (Experiments 1 and 2) or a human opponent (Experiment 3). Each experimental block comprised 10 trials. When it was the subject's turn, they performed the reaching movement in the same way as in the individual task. After the feedback period, a red cursor (radius: 0.3 cm) was shown on the screen, indicating the opponent's turn. In the competitive task with the computer opponent, the opponent's cursor movement (trajectory) was automatically manipulated based on pre-recorded sample trajectories made by the experimenter. Each movement endpoint was determined by the pre-programmed algorithm described below (Design and Procedure). In the competitive task with a human opponent, two sets of pen-tablet and screen were prepared. The same stimuli were presented on each screen. A vertical partition was used to separate the two subjects, preventing verbal and non-verbal communication during the experiment. In the competitive task, the subjects were instructed to win the game by achieving a higher total score than their opponents at the end of each experimental block.

Design and Procedure
For the risk-neutral and risk-averse groups in Experiment 1 and Experiment 2, there were three experimental sessions: baseline comprising 5 blocks of the individual task, For the computer opponent, we randomly sampled the endpoint of each trial from a Gaussian distribution with mean * and variance 2 , where represents the coefficient that determines the opponent's risk-sensitivity and * represents the optimal mean endpoint maximising the expected reward given the variance of reaching endpoint 2 . Before each experimental block in the competitive task, we determined the value of 2 by calculating the subject's reaching variance over the past 40 trials. This means that the computer opponent always had the same reaching accuracy as the subject. We then defined the computer's mean endpoint as * . We could dynamically manipulate the endpoint of the opponent by changing coefficient . The value of for the risk-neutral opponent was set as = 1 for all 12 experimental blocks, that is, the computer opponent always behaved as an optimal risk-neutral decision-maker who gradually aimed closer to the boundary line along with the reduction of the reaching variance. For the risk-averse opponent, this was set as = 1 for blocks 1-4, decreased in steps of 0.15 for blocks 5-8 and finally set as = 0.925 for blocks 9-12, that is, the computer opponent behaved as a sub-optimal risk-averse decision-maker who gradually aimed further from the boundary line. The differences in movements between the two computer opponents can be seen in Fig. 4 a,b.

Model assumptions
Based on Bayesian decision theory [13][14][15] , we modelled the optimal mean endpoint by maximising the expected gain for a given sensory motor variability to quantify the subjects' risk-sensitivity and define the computer opponents' endpoint. In this model, the expected gain ( ) for a selected aim point can be calculated by integrating the gain function ( ) with the probability distribution of the movement endpoint ( | ).
We assumed that the actual movement endpoint is distributed around a selected aim point with sensory motor variability 2 according to Gaussian distribution.
Given a subject's variability in movement endpoint 2 , we estimated the optimal mean endpoint by maximising the expected gain function. * ( ) = argmax ( ) To determine subject's risk-sensitivity in the individual (baseline) and competitive tasks, we  are plotted from the first to fifth trials after the competitive task started (a',b',d'-f') or when the individual task restarted (c'). The horizontal lines above the bar graphs indicate statistically significant differences. * represents p < 0.05, and ** represents p < 0.01. Open circles denote individual subjects. In the preceding individual task, the risk-seeking strategy was adopted, indicated by the deviation from the optimal to the observed mean endpoint. However, the decrease in the endpoint was seen at the beginning of the competition (a',b'), suggesting that the subjects shifted their risk-seeking strategy to be risk-averse. This effect was robust even when the competitive task began after the individual task had been repeatedly performed (d'), when the opponents' endpoint was presented in advance (e') and when the subjects competed against human opponents (f'). When the subjects repeated the individual task, this strategy shift was not observed (c').

Fig. 3. Achievement of risk-neutrality influenced by the opponents.
Risk-sensitivity values, defined as the difference between the mean endpoint and optimal endpoint, were plotted for the individual task (baseline) and each block of the competitive task.
Positive values indicate a risk-seeking strategy, and negative values indicate a risk-averse strategy. Green, blue and magenta asterisks denote p < 0.05 from the risk-neutral value (i.e. 0) for the risk-neutral opponent, risk-averse opponent and individual group, respectively. No asterisk indicates that the optimal, risk-neutral strategy was achieved. For the individual group, a consistent risk-seeking strategy was observed. This sub-optimal strategy partially improved in the risk-neutral group. In the risk-averse group, improvement in the risk-seeking strategy was observed in all blocks of the competitive task except for the first block. The group comparison of risk-sensitivity values is shown in Supplementary Figure 3. Vertical lines indicate the mean values of each distribution. Permutation tests revealed a significant difference between the slopes of regression lines, suggesting a smaller influence of the opponents' decisions on the subjects' decisions in the left half than the right half of the plot.