The reliability of assistance systems modulates the sense of control and acceptability of human operators

Individuals are increasingly required to interact with complex and autonomous technologies, which often has a significant impact on the control they experience over their actions and choices. A better characterization of the factors responsible for modulating the control experience of human operators is therefore a major challenge to improve the quality of human-system interactions. Using a decision-making task performed in interaction with an automated system, we investigated the influence of two key properties of automated systems, their reliability and explicability, on participants' sense of agency (SoA), as well as the perceived acceptability of system choices. The results show an increase in SoA associated with the most explicable system. Importantly, the increase in system explicability influenced participants' ability to regulate the control resources they engaged in the current decision. In particular, we observed that participants' SoA varied with system reliability in the "explained" condition, whereas no variation was observed in the "non-explained" condition. Finally, we found that system reliability had a direct impact on system acceptability, such that the most reliable systems were also considered the most acceptable systems. These results highlight the importance of studying agency in human–computer interaction in order to define more acceptable automation technologies.

The reliability of assistance systems modulates the sense of control and acceptability of human operators Quentin Vantrepotte 1,2 , Valérian Chambon 1,3,4* & Bruno Berberian 2,3,4* Individuals are increasingly required to interact with complex and autonomous technologies, which often has a significant impact on the control they experience over their actions and choices.A better characterization of the factors responsible for modulating the control experience of human operators is therefore a major challenge to improve the quality of human-system interactions.Using a decisionmaking task performed in interaction with an automated system, we investigated the influence of two key properties of automated systems, their reliability and explicability, on participants' sense of agency (SoA), as well as the perceived acceptability of system choices.The results show an increase in SoA associated with the most explicable system.Importantly, the increase in system explicability influenced participants' ability to regulate the control resources they engaged in the current decision.In particular, we observed that participants' SoA varied with system reliability in the "explained" condition, whereas no variation was observed in the "non-explained" condition.Finally, we found that system reliability had a direct impact on system acceptability, such that the most reliable systems were also considered the most acceptable systems.These results highlight the importance of studying agency in human-computer interaction in order to define more acceptable automation technologies.
Healthy adults generally feel in control of their own actions and the effects of those actions-a feeling that is sometimes also referred to as "sense of agency" 1 .The importance of the sense of agency in human life cannot be understated.The ability to envision oneself as an agent, endowed with the capacity to bring about changes in the external environment, plays a key role in assigning moral and legal responsibility 2 and would serve as a key motivational force for human behaviour [3][4][5] .
The potential impact of technology on human operators' sense of control has received increasing attention in recent years.Indeed, individuals are increasingly required to interact with more or less complex and autonomous technologies.A paradigmatic example of such human-machine interaction is embodied by the relationship between an airplane pilot and the flight assistance system.In the cockpit, the increasing level of automation introduced by the assistance system can sometimes generate ambiguities as to who controls the aircraft-the pilot themselves or the automated flight assistance system.In a pioneering study, Berberian and colleagues 6 experimentally demonstrated this detrimental influence of automation on sense of agency (SoA).Using a flight simulator, they investigated the evolution of participants' SoA during an aircraft navigation task with different levels of automation.They demonstrated a decrease in SoA as the level of automation increased, both at the explicit (verbal report) and implicit levels (temporal binding measure) 6 .
The opposite, however, has also been observed.According to Wen et al. 7 , being assisted during a pointing task can potentially enhance agentive experience.The authors used a pointing task in which participants were assisted by a computer, but only relevant commands from the participant were taken into account, with incorrect commands being ignored, so that the participant did not have full control over pointing movements.The results showed that computer assistance significantly increased participants' SoA compared to the condition in which all participant commands were executed (see also 8,9 ).This apparent contradiction reflects the paradoxical effects of automation on human behavior: automation can induce an effective loss of control in the human operator, Participants.Forty participants were recruited (27 females, mean age = 32.4;SD = 10.4).Sample size calculation was based on the effects observed in a previous study using a similar protocol 13 .A priori power calculation was performed using G*Power software 19 , with a power of 0.80 and two-sided alpha level set at 0.05.The number of participants required to detect a mean effect size of d = 0.4 (based on a similar study 13 ) in a pairwise comparison, with exclusion of ~ 10% of the sample on the basis of predefined exclusion criteria, was forty.In practice, 40 participants were recruited to reach 34 analyzable subjects (6 excluded) on the basis of our exclusion criteria (see "Data analysis" section for more details).The study was approved by the Inserm Ethics Evaluation Committee (CEEI, n. 21-810).All participants gave written informed consent before inclusion in the study, which was carried out in accordance with the declaration of Helsinki 20,21 .The inclusion criteria were being older than 18 years, reporting no history of neurological or psychiatric disorders, no auditory disorders, and a normal or correctedto-normal vision.Participants were all naive to the purpose of the study.
The studies were conducted on the experimental platform (PRISME) of the Institut du Cerveau et de la Moelle épinière (ICM) in Paris.The participants were tested in groups (maximum 12 participants) in a large testing room designed for this purpose.Each participant was brought individually in front of a computer, isolated by partitions, and was equipped with noise-cancelling headphones.The study was divided into two sessions, on two different days to reduce the effect of fatigue.Each session lasted 2 h, and participants were paid 40 euros after completing both sessions.The procedure was the same for both sessions 1 and 2 (training and calibration phases, followed by the experimental task).
Material and stimuli.The participants were seated in front of a screen at approximatively 60 cm (HP 23i; resolution: 1980 × 1080, 60 Hz).A keyboard and a noise cancelling headphone were used to perform the experiment.As in the previous study, the experiment consisted of a combination of an avoidance task, derived from the Random Dot Kinematogram (RDK) experimental protocol, and a temporal estimation task between an action and its effect, used as an implicit measure of participants' sense of control 22 .The stimuli presented to participants were identical in every respect to those used previously (for details, see 13 ).Matlab R2016b (MathWorks Inc.) and the Psychophysics Toolbox [23][24][25] were used to develop and run the experiment.

General procedure.
Training and calibration phases.Each session was split in two experimental blocks.
The first block consisted of two training sessions and one calibration phase.The training sessions (15 trials each) were designed to familiarize the participant with the avoidance task and the temporal estimation task, whereas the calibration phase was used to estimate a psychometric curve for each participant based on RDKs.
The avoidance task consisted in detecting the orientation (left or right) of a target dot cloud while ignoring non-coherent dots, and avoiding this target cloud by pressing the key corresponding to one of the two directional arrows (right or left arrow) (see Fig. 1).Participants who failed to achieve 75% accuracy repeated the training phase.
This training was followed by a calibration phase, in which participants' performance at different levels of coherence was estimated using a double-staircase method, by fitting a psychometric curve to their data.The aim of this phase was to determine three levels of difficulty for the avoidance task (easy, medium and high).
Finally, the second training session involved performing the avoidance task followed by a temporal estimation task.The latter task involved participants estimating the interval between their response (left or right directional arrow) and a neutral sound that occurred shortly after this action.This interval estimation task was used to Figure 1.Illustration of a typical trial during the experimental phase.A fixation cross indicated the start of the trial during 800 ms.Then, a RDK appeared for a short period (800 ms).Participants were asked to provide their response when the response screen appeared.A sound was played for 100 ms to 1500 ms after the participant's response.Then, the participant was asked to estimate the delay between their response and the apparition of the sound, their decision confidence, and their agreement with the forced choice.The trial ended after feedback on the performance was given (correct: green tick; error: red cross).www.nature.com/scientificreports/characterize a phenomenon known as "Temporal binding" (TB), which refers to the perceived compression of the temporal interval between a voluntary action and its external consequence 22 .This compression is often seen as an implicit marker of the sense of agency: a shorter perceived interval would indicate a higher sense of agency over the subsequent outcome (for a review, see).We computed TB as the difference between the perceived time and the actual delay between the button press and the sound, as routinely done in studies using the TB measure [26][27][28] .Note that, unlike the original study 13 , here we use only an implicit measure of SoA in order to mitigate drawbacks associated with explicit measures-such as posthoc rationalization effects or social desirability biases-, which could potentially impact the subsequent acceptability measure (for example, participants might report that the system is more acceptable when they previously reported that they felt more in control).
Once the first block (training and calibration) had been completed, participants carried out the testing phase.
Testing phase.The testing phase consisted of the avoidance task followed by the temporal estimation task (Fig. 1).A total of 432 trials were carried out, divided into 12 blocks of 36 trials.Of these, 216 trials assigned to each level of the experimental factors (Explicability, Automation and Difficulty-see below), and 144 trials assigned to each of the reliability conditions (low/medium/high).Two levels of difficulty (easy/hard), corresponding to the 2 levels estimated from the calibration phase, were used during the avoidance task.
Automation factor.Two types of trials were presented, in which the level of automation (free-choice vs. forcedchoice) of the response was manipulated.In free-choice trials, participants chose the target arrow (left or right) according to the orientation of the cloud.In forced-choice trials, the computer preselected an arrow for the participant.Free-choice and forced-choice trials were randomly interleaved within each block.At the start of each trial, a brief sentence signaled the nature of the forthcoming condition (free trials: "You choose"; forced trials: "The system chooses") to avoid anticipatory free responses from the participant (see 14,29,30 for a similar procedure).Choice and system confidence were always aligned, with the system choosing the direction in which it was most confident.
Explicability factor.Different decision support systems were implemented with different levels of reliability depending on the difficulty of the current trial.The reliability of the system was defined here by its performance, i.e., its rate of correct responses to the task (low/medium/high).In half of the trials, the system provided its confidence about its decision, while in the other half no confidence was provided.Trials with confidence were labelled "guided trials" whereas trials without confidence were labelled "unguided trials" (see Fig. 2).The confidence returned by system could range from 1 (the system was not confident in its decision) to 100 (the system was absolutely certain of its decision).Importantly, the confidence provided by the systems was calibrated according to the difficulty of the current trial and the reliability of the system.The assistance provided by the system was directly related to its reliability.Since participants produced an average of 100% correct responses on easy trials and 55% on difficult trials, the low-reliability system was therefore on average worse than participants, the medium-reliability system was on average as good as participants, and the high-reliability system was on average better than participants (see Suppl.Table 1).The order of presentation of the reliability conditions was pseudo-randomly shuffled.To avoid system-related recency effects affecting interactions with other systems, all participants started with the 'medium' reliability system.Time estimation task ("temporal binding", TB).The avoidance task was followed by a time estimation task, in which participants had to estimate the perceived time interval (1-1500 ms) between their response to the orientation of the cloud and a subsequent sound, among three possible responses (250 ms, 750 ms, and 1250 ms).

Figure 2.
Illustration of the different types of response conditions.The free choice condition corresponds to trials in which the subject can choose the direction by herself.The forced choice condition corresponds to trials in which the subject must follow the choice of the system (indicated by a red square).One of the two systems (system A) guides the subject by returning its relative confidence (between 0 and 100) in each of the two possible answers.The second system returns nothing (Unguided).www.nature.com/scientificreports/Decision confidence.Following the time estimation task, the participant was asked to indicate confidence in their performance on the avoidance task using a Likert scale ranging from 1 (low confidence: the participant thinks they did not avoid the cloud) to 8 (high confidence: the participant thinks they avoided the cloud).Depending on the type of trial, the participant was asked to judge either their own performance (free trial) or the performance of the system (forced trial).Following the confidence rating, a feedback (green tick or red cross for correct or incorrect responses) was given to the participant on their performance on the avoidance task (Fig. 2).This feedback was included so that the subject would be informed of their performance and that of the system, and could thus assess the reliability of the system's recommendations.It is important to note that TB was systematically measured before the feedback was delivered, and therefore could not be affected by the feedback itself.
Agreement with the forced choice.During the forced trials only, the participant was asked to indicate their agreement with the choice imposed by the system, using a Likert scale ranging from 1 (low agreement: the participant thinks the system is absolutely wrong) to 8 (high agreement: the participant thinks the system is absolutely right).
Acceptability measure.Finally, at the end of each block, participants rated the system's "acceptability" on two dimensions-usefulness and satisfaction-using a scale ranging from "not at all" to "completely".A total of 18 acceptability measures were collected per subject (one for each type of system encountered) and averaged to obtain a total acceptability score per subject (Sup.Table 3).
Data analysis.Data were analyzed using the R software 31 and the ezANOVA package 32 .For the Temporal Binding measure, the raw data were first filtered according to the interval reported by the participant for each interval category shown (250 ms, 750 ms and 1250 ms).Two exclusion criteria were used, based on a previous experiment with a similar paradigm 13 .First, in each participant, an average interval was calculated for each category and intervals that were two standard deviations above or below this average interval were considered outliers.We checked that no participant had performances containing more than 7.5% outliers (no participant was excluded on the basis of this criterion).To further test the reliability of the reported intervals, a second exclusion criterion was defined on the basis of the correlation between perceived and actual intervals for each interval category.Six participants were excluded due to low non-significant coefficients (R's < 0.2).
Performance to the avoidance task (% of correct responses and response times) and agreement score with the forced choice were analyzed using two 2 × 2 × 3 repeated-measures ANOVAs with explicability (guided vs. unguided), task difficulty (easy vs. high) and system reliability (low vs. medium vs. high) as within-participants factors, whereas Temporal Binding, confidence scores and Response Times (see Supplementary Results) were analyzed using two 2 × 2 × 2 × 3 repeated-measures ANOVAs with automation (free vs. forced-choice), explicability (guided vs. unguided), task difficulty (easy vs. hard) and system reliability (low vs. medium vs. high) as within-participants factors.Note that percentages of correct responses were only analyzed in the free-choice trials, where participants made an intentional choice.Main and interaction effects from all ANOVAs were further analyzed using Bonferroni-corrected post hoc pairwise comparisons.The normality assumption was met for the residuals of all dependent variables, with the exception of the confidence score (W = 0.9, p < 0.05).The data for this variable were therefore min-max normalized before analysis.
In summary, and as expected, the high-reliability system and the explainable system were the systems that most influenced participants to choose the correct direction (relative to the dot cloud).Participants also made more errors on hard trials when interacting with the low-reliability system.
In summary, participants' SoA was greater (i.e., perceived action-outcome intervals were shorter) when interacting with the low-reliability guided system (Fig. 4, left panel, red curves).
In summary, and as expected, participants found the guided and more reliable system more acceptable-both on the usefulness and satisfaction dimensions.

Discussion
It is now accepted that providing assistance to human operators can improve their performance without altering their sense of control (SoA) 10,13,27 .Thus, when the assistance system is effectively calibrated, both in terms of objective performance and metacognitive evaluation, the operator's SoA naturally benefits from the cues communicated by the system 13 .One reason for this could be that a well-calibrated system allows for a better allocation of resources in the action or task.In this study, we sought to test this hypothesis further by determining whether, and to what extent, the SoA of human operators interacting with an assistance system is modulated when the reliability of the assistance varies.To do so, we selectively changed the level of performance (or reliability) of the assistance system and measured the influence of this change on the SoA and acceptability of human participants.
Reliability and sense of control.We first hypothesized that system reliability would modulate participants' SoA, and specifically the amount of control used to perform the task (see 13 for the relationship between SoA and control used).In particular, we predicted that the lower the reliability of the system, the higher the SoA would be, as low reliability requires the operator to commit more cognitive resources to the action to maintain control.We first observed that participants were able to discriminate between the different types of systems they interacted with.Thus, participants were overall more confident in their own decision (Fig. 5) and more often disagreed with the system choice when interacting with low-reliability systems.As expected, participants' SoA was modulated by system reliability.Specifically, participants' SoA was higher when they interacted with the low-reliability system.Our results are consistent with previous studies investigating "attentional allocation" strategies during human-machine interaction with imperfect assistance [33][34][35] .These studies showed that when participants interact with a less reliable system, they commit more cognitive control to the task.Variations in our implicit measure of control (temporal binding) appear to reflect these variations in the amount of control allocated to the current task as a function of the reliability of the assistance system.www.nature.com/scientificreports/Explicability, automation and sense of control.Our second hypothesis was that communicating the system's confidence in the best decision to be made (explicability) would enable agents to better regulate the allocation of their control resources in the current task.We observed an interaction between the explicability and reliability factors: participants' SoA, as measured by temporal binding, was primarily impacted by the system's confidence in the guided condition.In other words, participants' SoA was highest when the system's reliability was low, but only when the system returned information about confidence in its decision; conversely, SoA was lowest when no explanation was provided.Taken together, these results suggest that the metacognitive information provided by the system would help select the optimal policy for cognitive control engagement.Cognitive control 36,37 engagement is known to depend on detection and awareness of variations in task demands.It is likely that the metacognitive information provided by the system helps participants become aware of these variations in task demands and contributes to a more flexible engagement of cognitive control.Interestingly, in the presence of metacognitive information, the level of automation did not affect the level of SoA, such that participants experienced similar control in the free and forced conditions.An opposite pattern of results was observed in the unguided condition.Specifically, we found that SoA was highest when participants were forced to choose the system.In other words, in the absence of metacognitive information about the system's confidence in its own performance, cognitive control regulation becomes more difficult, and the participant's SoA depends primarily on the assistance provided by the automated system.
Together, these results suggest that participants' SoA can be influenced by different sources of information depending on their reliability, but also their availability 3,38 .When the system guides the decision (via metacognitive information), its reliability plays a major role in decision making and in the formation of the SoA (Fig. 4).This positive influence of explicability on participants' SoA suggests that the explicability of the system helps to proactively optimize the resources to be committed to making the right decision 39 .In contrast, when the system does not guide (no presence of metacognitive information), the reliability of the system does not influence the participant's SoA, which therefore relies on other types of information, such as the level of automation of the task (Fig. 4) 11,13 .
These results are consistent with recent studies that showed that making a system more "explainable", i.e., making it less opaque (via a bonus or confidence in its decision), benefits to the SoA of participants interacting with it 13 .Our results further suggest that the confidence communicated by the assistance system can be used by the participant as a metacognitive signal to adjust the amount of resources to invest in the current task (metacognitive control 40 ).

Explicability and acceptability.
We hypothesized that a decrease in SoA, caused by a decrease in system reliability, would be associated with a decrease in system acceptability 11,13 .Our measures of acceptability, represented by the usefulness and satisfaction scores, were indeed positively influenced by system explicability.Critically, both guided and unguided systems had the same levels of performance, yet the guided system was generally perceived more favourably.As already suggested in the literature 11,13 , improving the explicability of systems, by making them more intelligible to humans, improves their usefulness, and the communication of metacognitive information could contribute to making the automated system more cooperative.Interestingly, our measures of acceptability were also positively influenced by system reliability: participants felt greater overall acceptability for the high reliability system (Fig. 5).As noted above, this result suggests that the measures of control and acceptability used in this study do not index the same thing.While the temporal binding measure would be sensitive to the demands of the task at hand, and would therefore reflect a proxy for the amount of resources the subjects believe they should invest in the task, acceptability would be directly related to the experience of control felt by operators when interacting with the system 41 .

Conclusions and perspectives
Our results show that participants' SoA can be influenced by different sources of information.Metacognitive information, when provided by the system during the decision (i.e., in guided conditions), appear to play an important role in decision making and SoA formation, by allowing the participant to engage the appropriate control resources in the task at hand.When the system does not deliver metacognitive information (i.e., unguided conditions), participants rely on other types of information, such as the level and nature of task automation.Our results presuppose a strong link between the resource engaged in the action and the temporal binding measure (used as a proxy for the participant's control experience).Interestingly, recent studies 42,43 have linked cognitive resources to response times.Therefore, we examined response times, and a significant effect of system reliability was observed (see Supplementary Analysis).However, the results obtained must be taken with caution, as the protocol used was not specifically designed to analyze response times-i.e., we did not ask participants to respond as quickly as possible.To further test whether interaction with a more or less opaque (or reliable) decision support system leads to different management of attentional resources, future research could probe more directly the control resources engaged by the participant during the task, either using behavioral measures (such as the "control used" scale used in 13,41 ) or using objective metrics such as eye-tracking.Time spent on a particular area of a screen is indeed traditionally considered an indication of difficulty in extracting relevant information from that area 44,45 .Eye-tracking could thus help determine the participant's focus of attention during the task, as well as the duration of this focus, and identify the moment when the participant becomes aware of the information returned by the system.Eye-tracking, as an indirect measure of attentional resource management, would usefully complement the behavioral measures employed in this paradigm.
Another potential limitation of the study is the apparent contradiction between measures of confidence and SoA: participants report greater confidence in their decisions in unguided free-choice trials compared with forced-choice trials, while showing the lowest SoA in these conditions.We believe, however, that this contradiction is only apparent and stems from the fact that confidence and SoA do not measure the same thing: while SoA measures, or expresses, causal responsibility for the outcome of a decision, confidence instead estimates the subjective probability of being correct in one's decision 46 .Our tentative explanation is that, under conditions of (unguided) free choice, participants are more likely to monitor their internal performance-related signals rather than focus on external recommendations, and therefore tend to express greater confidence in their choice as a result.
Making a system more explainable (or intelligible to the human operator) in its choices may reduce the physical and cognitive distance between the system and the operator, a distance that can adversely affect both performance and user experience in human-computer interactions 47 .Our results support the idea that the subjective experience of control in human-machine interaction could be used as a guideline to make systems more agentic and acceptable 11,13 . https://doi.org/10.1038/s41598-023-41253-8 13:14410 | https://doi.org/10.1038/s41598-023-41253-8

Figure 3 .
Figure 3. (a) Mean Correct Response Rate (in %) associated with the two levels of Difficulty (Easy vs. Hard) and System Reliability (Low vs. Medium vs. High) factors (left panel).(b) Mean Correct Response Rate (in %) associated with the two levels of Explicability (Guided vs. Unguided) and the system reliability (Low vs. Medium vs. High) factors (right panel).Error bars show ± 1 within-subject standard error of the mean (S.E.M).

Figure 4 .
Figure 4. Average temporal binding associated with the automation (free vs. forced), explicability (guided vs. unguided), system reliability (low vs. medium vs. high) and difficulty factors (easy vs. high).Solid lines: easy trials; dashed lines: hard trials.Error bars show ± 1 within-subject standard error of the mean (S.E.M).

Figure 6 .
Figure 6.(a) Mean Satisfaction Score associated with the two levels of explicability (Guided vs.No Guided) and system reliability (Low vs. Medium vs. High) factors (left panel).(b) Mean Usefulness Score associated with the two levels of explicability (Guided vs.No Guided) and the system reliability (Low vs. Medium vs. High) factors (right panel).Error bars show ± 1 within-subject standard error of the mean (S.E.M).