Game theoretical trajectory planning enhances social acceptability of robots by humans

Since humans and robots are increasingly sharing portions of their operational spaces, experimental evidence is needed to ascertain the safety and social acceptability of robots in human-populated environments. Although several studies have aimed at devising strategies for robot trajectory planning to perform safe motion in populated environments, a few efforts have measured to what extent a robot trajectory is accepted by humans. Here, we present a navigation system for autonomous robots that ensures safety and social acceptability of robotic trajectories. We overcome the typical reactive nature of state-of-the-art trajectory planners by leveraging non-cooperative game theory to design a planner that encapsulates human-like features of preservation of a personal space, recognition of groups, sequential and strategized decision making, and smooth obstacle avoidance. Social acceptability is measured through a variation of the Turing test administered in the form of a survey questionnaire to a pool of 691 participants. Comparison terms for our tests are a state-of-the-art navigation algorithm (Enhanced Vector Field Histogram, VFH) and purely human trajectories. While all participants easily recognized the non-human nature of VFH-generated trajectories, the distinction between game-theoretical trajectories and human ones were hardly revealed. Our results mark a strong milestone toward the full integration of robots in social environments.

The widespread diffusion of service robots for diverse applications is making autonomous robots more and more pervasive in our lives 1 . In the near future, autonomous robots will likely coexist and share our very space. Application scenarios will be characterized by populated and dynamic environments, where autonomous navigation has to ensure not only the physical safety of human subjects, but also a great degree of social acceptability 2 . Trajectory planners at the state of the art mostly aim at ensuring the former requisite [3][4][5] , while seldom tackling the social acceptability issue. Most of contemporary autonomous navigation algorithms model humans as inanimate dynamic obstacles rather than social entities interacting with each other through complex and strategized patterns 6 . The oversimplification of human behavioral traits in the design of navigation algorithms may have severe consequences, such as the emergence of the well-known "freezing robot problem" 7 .
The freezing robot problem occurs when the environment exceeds a certain level of complexity and the robot is no longer able to manage it because of some deficiencies in the navigation algorithm, for example in the prediction of the human motion model. This context could lead to conditions in which the robot considers all paths unsafe, so it freezes its motion (or makes unnecessary maneuvers) to avoid collision with humans 7 .
Socially-aware navigation is gaining momentum as a fundamental requirement for the design of social robots, able to adhere to the social convention of providing a friendly and comfortable interaction with humans. Sociallyaware navigation combines perception, dynamical system theory, social conventions, human motion modeling, and psychology. Trajectories generated in this context should be predictable, adaptable, easily understandable, and acceptable by humans 8 . Toward improving trust, comfort, and social acceptance, humans should be explicitly considered by robots as intelligent agents who interact and may influence the motion of others 9 . Recent efforts in socially-aware navigation model humans as static entities 10 or as agents driven by very simplistic motion models 11 . Such simplistic assumptions may hardly cope with the complexity of human behavior and interaction, yielding trajectories that are far from predictable, smooth, and in turn acceptable for humans. Models based Methods Figure 1 schematizes the proposed procedure for the realization and validation of our game-theoretical framework for the social acceptability of robotic trajectories. The methodology can be subdivided in four main logical phases, corresponding to the panels in the figure. First, a game-theoretical model of pedestrian motion is devised and its parameters are tuned on the basis of the analysis of human motion videos (panel (a)). Second, a robotic trajectory planner informed by the game-theoretical pedestrian model is realized. The robot is deployed and operated in a virtual humanly-populated environment, where humans execute real trajectories extracted from videos. In this phase, three important performance parameters in robotic trajectory planning (path length ratio, path regularity, and distance to the closest pedestrian) are evaluated and compared with the state-of-the-art VFH algorithm (panel (b)). Third, the virtual environments containing humans and the robot are processed and prepared to be administered for the validation survey questionnaire (panel (c)). Finally, the survey questionnaire is administered and the results are collected and analyzed (panel (d)). In the following, the main components that constitute our methodology are illustrated in detail. Game-theoretical model. Assumptions. The assumptions supporting our game-theoretical model for human motion are listed in what follows. To improve readability, here and henceforth we will refer to human subjects as agents. This term will be also used for the robot when no distinction between the two categories is required.
All pedestrians are rational agents with common knowledge moving in a 2D populated dynamic environment. Rational behavior entails that agents only aim to reach their own individual motion goal (i.e. the location to which the agents wish to go). In mathematical terms, this translates into a minimization of an individual cost (equivalently, a maximization of an individual benefit), such as their overall path length 33    Such an assumption is reasonable when dealing with models of human traits, as individuals commonly learn these skills by experience during everyday life 9 .
We consider a populated dynamic environment, possibly busy, but not crowded, such as typical streets occupied by pedestrians walking on sideways, or populated indoor spaces, such as hotel halls 9 . We suppose that the environment contains static obstacles, which have to be avoided by agents in a natural manner. Our approach is based on a microscopic modeling strategy, whereby a single individual is mapped onto a single software agent, which mimics the individuals' decision and their interactions.
Game description. The proposed model for pedestrian motion is a non-cooperative, static, perfect information, finite, and general-sum game with many players (or agents).
In our model, each agent aims at reaching its own goal individually, but the minimization of its individual cost function does not exclude the possibility to collaborate with other agents, should this help attaining individual goals 35 as well.
Our model recognizes as groups those pedestrians that move staying close to each other keeping a similar direction of motion. These groups of agents are considered as single players, whereby members of the group share a common strategy and a common motion pattern. This last assumption practically entails that the navigation strategy of the robot in avoiding human groups would treat them as a compact group of people that cannot be split to better attain its own navigation goal.
The game is static in the sense that agents move and take decisions simultaneously, it is based on perfect information, that is, each agent knows the current and the previous actions of all agents, e.g. via direct observations.
The game is also finite, i.e., the game has a finite number N of agents belonging to the agent set N , where each agent i ∈ N can choose among a finite number of actions available, defined by the action set , which is supposed to be common to all agents. In particular, we indicate with θ i (t) ∈ � the action executed by agent i at the discrete time t. In our application, the execution of action θ i (t) corresponds to a motion of agent i in the 2D plane at constant velocity v and constant heading θ i (t) over the whole discrete time step t . We assume that agents have a bounded visibility angle and that θ i (t) is designed to uniformly partition such an angle. We denote by p i ∈ R 2 the position of agent i in the 2D environment, with respect to a fixed orthogonal reference frame.
Moreover, the proposed model is a general-sum game, i.e., the sum of all gains and losses of the utility functions over all agents is not necessarily equal to zero.
Similar to 20 , we postulate that, in such a navigation task, agents tend to reach a Nash equilibrium-the condition in which no agent has an incentive to unilaterally change its own action (or strategy) if the other agents do not change theirs. In other words, a Nash equilibrium occurs when each agent achieves its best response, i.e., its minimum individual cost, given the actions of the other agents. In general, however, existence and uniqueness of a Nash equilibrium is not guaranteed in our setup, and its analytical characterization is almost always impossible to obtain, thus making numerical approaches for an approximate computational necessary. Here, the Nash equilibrium is approximately computed via the sequential best response approach 36 .
Let us explain the idea of the sequential best response for two agents, A and B: agent A observes the motion of agent B and then solves an optimization problem to determine its own trajectory, given the latest observation of agent B. Afterwards, a check action is performed, verifying if the strategies of both agents are the same as those computed in the previous iteration; in such a case, the game has reached a Nash equilibrium. Otherwise, agent B computes its optimal strategy, given the latest observed strategy of agent A. The procedure is applied iteratively, until the equilibrium condition is met. The same strategy identically extends to N agents.
Our modeling procedure assumes that all the agents in the planar space play the game mentioned above. After the model has been identified, we will use it to control a single, synthetic agent to navigate through the populated environment. Such an agent is called robot player.
Optimization problem. The sequential best response approach in our game-theoretical model for human motion in a populated environment requires the solution of a set of interdependent optimization problems, one for each agent moving in the environment. The goal of the optimization problem for each agent i is to find the best sequence of actions, θ * i = (θ i (t), θ i (t + �t), θ i (t + 2�t), . . . , θ i (t + T�t)) , over a finite prediction horizon T t , given the actions of the other agents. Without loss of generalization and to improve readability, here and henceforth we assume a unitary discrete-time step, i.e., t = 1.
All agents seek for the Nash equilibrium by applying the sequential best response strategy, solving their own optimization problem on the basis of the observed behavior of the rest of the population. We define the optimization problem for each agent i ∈ N as where the three summands are defined as follows: (i) The term � goal (θ i ) tends to reduce the overall path length for each agent i and, hence, models the goaloriented attitude of the agent: with γ (t) being a time-variant weight factor; p i (t, θ i (t)) is the estimated position of agent i at time t, considering a constant speed modulus v and the heading control action θ i (t) applied at time t, computed using the kinematic update Eq. (2); and p * i is the estimate of agent i's goal over the time horizon T. In the absence of an explicit definition of a pedestrian's goal, we assume that, within the horizon [t, t + T] , the goal of agent i lays on a straight line starting in p i (t) and oriented along the observed agent heading at time t. Under these assumptions, the practical meaning of the time horizon T is the estimate of the time interval within which a pedestrian sets up and maintain their walking goal.
(ii) The term � smooth (θ i ) penalizes excessive rotations, thus promoting smooth trajectories. In fact, during navigation, humans tend to avoid too many changes of orientation to minimize their energy consumption 34 : where θ i (t) , θ i (t − 1) are the orientation of the agent at time t and (t − 1) , respectively. We observe that the term � smooth (θ i ) is weighted in a complementary fashion to � goal (θ i ) , to satisfy the assumption (further detailed in the Implementation section) of their relative importance as long as the agent approaches its target.
(iii) The term � obs (θ i ) tends to optimize the natural interaction with static objects. In fact, humans tend not to walk too close to static obstacles, unless it is necessary. For this reason, we model this behavior as a soft constraint: where ρ is a weighting factor and the denominator in (6) is the distance between the agent position p i (t, θ i (t)) and the closest static obstacle p obs at time t. The exact procedure to compute p obs will be explained later. Practically, (6) penalizes small distances between an agent and static obstacles.
The inequality in (1b) is a hard constraint imposing to avoid other agents, assuming a circular region around agents as their personal space 28 to be avoided. In this way, agent i is required to maintain at least a minimum distance β with other agents in the observed scenario. Constraint (1c) models the avoidance of static obstacles by imposing that the position p i (t, θ i (t)) is outside the obstacle space O obs , defined as a subset, possibly disconnected, of the 2D planar space, occupied by obstacles, where motion of agents is forbidden.
Equation (2) formalizes the kinematic update of the position of agent i at time t, subject to a heading command θ i (t) , at a constant velocity v.
Validation. The proposed game-theoretical human motion model is validated by conducting a qualitative comparison between generated trajectories and human ones, observed in open-source surveillance videos 37,38 . These surveillance videos, used to validate the proposed model, show a typical urban scenario in which multiple agents walk interacting with each other and avoiding static obstacles. Figure 2 illustrates randomly selected frames of such surveillance videos in two different scenarios. Specifically, Fig. 2 compares real trajectories executed by humans (Fig. 2a,c) with the estimated trajectories generated for all agents by the proposed model solving our game-theoretical problem (Fig. 2b,d).
We observe that, in both the illustrated scenarios, our game-theoretical approach generates collision-free trajectories (Fig. 2b,d) that are smooth and resemble those executed by their human counterparts. However, we note that the trajectories generated by our algorithm exhibit a sharper reaction than humans in the vicinity of surrounding agents. This is evident when comparing Fig. 2a,b, focusing on the interaction between the green trajectory and the blue one. A comparable circumstance can be observed in Fig. 2c,d, with reference to the yellow trajectory. This phenomenon is most likely caused by the discrete action set associated with each agent. Notably, in our implementation an agent can choose one out of seven possible headings inside their own visibility zone, resulting in a resolution of ± π/6 rad , in the attempt of minimizing the corresponding cost function. On the other hand, human subjects can select their heading over an infinite set.
A further cause of discrepancy between human and game-theoretical trajectories resides in the kinematic update of the agent position in Eq. (2)-a linear update with constant heading and velocity over the whole sampling step-and the estimation of the human target, assumed to be constant over an interval of duration T-actually an unknown, subject to the very stochastic nature of human behavior.
Algorithm. www.nature.com/scientificreports/ The game-theoretical model of pedestrian motion described above is used to inform a robotic trajectory planner for autonomous robots moving in populated environments. www.nature.com/scientificreports/ The main steps executed by the proposed trajectory planner are described in Algorithm 1. First, the robot position ( p robot ) is initialized using the function InitializeRobotPosition . Then, the algorithm executes an iterative procedure that stops when the robot reaches its target position ( p goal ). Here, we will refer to both humans and the robot with the term "agent". Each iteration performs five main steps: recognition of groups of humans ( GroupRecognition ), first estimation of trajectories for all agents ( FirstEstimation ), collision checking between agents and with obstacles ( CheckCollision ), computation of the agent trajectory ( ComputeSolution ), and update of the robot position using the computed trajectory ( UpdateRobotPosition ). This iterative procedure predicts the agents' motion and generates the optimal robot trajectory over the fixed time horizon T, by applying the strategy detailed below. After such an optimal trajectory for the robot is computed, only the action corresponding to the first time step is actually applied to the robot and the process is repeated until the robot reaches its goal.
In the following, each step of Algorithm 1 is detailed: • GroupRecognition . The algorithm performs the group recognition of agents, considering the observed orientation of each agent and the distances between them. In fact, a group is typically moving while maintaining a common orientation and keeping a distance between agents shorter than the typical personal space of the single agent. Upon recognition, groups are considered as unique entities and treated as single agents in the subsequent phases. • FirstEstimation . A preliminary estimation of all agents' trajectories (i.e., θ ) is performed, projecting hypothetical rectilinear trajectories over the interval T. • CheckCollision . Given the trajectories of all agents ( θ ), previously estimated by the FirstEstimation , the CheckCollision function detects the possible occurrence of collisions between an agent i with obstacles and other agents, activating the flag variables C obs and C agents , respectively. In particular, we refer to the occurrence of a collision each time the individual personal space of an agent is violated. • ComputeSolution . Considering the estimated trajectories ( θ ), and the flags C obs and C agents , Algorithm 2 computes a solution of the motion planning problem for an agent i selecting one of the possible following cases: (i). if a collision with other agents is envisaged, two alternative solutions are evaluated. Then, the solution that involves the lowest cost of Eq. (3) will be selected. The first solution ( θ i gt ) is computed using the strategy defined in Algorithm 3, where trajectories are generated seeking for a Nash equilibrium solution of the game presented in the Game description section. The second solution is computed through the Decelerate function, which evaluates the opportunity to decelerate-a typical human behavioral trait in navigation-to avoid the collision with other agents. In particular, after identifying the discrete time step t at which a collision between agent i and other agents is envisaged to occur, the cost associated with sixteen different deceleration patterns is evaluated using the cost function (3), provided that constraints in Eqs. (1b) and (1c) are satisfied; (ii). if an agent is envisaged to collide with a static obstacle ( C obs ), the agent solves its individual optimization problem described above (without playing the game and, hence, not seeking for the Nash equilibrium); (iii). if no collision between agents or static obstacles is envisaged, trajectories are maintained as straight lines, keeping the current heading and a constant velocity, practically implementing what was already computed in the FirstEstimation procedure.
• UpdateRobotPosition . Considering the computed trajectory of the robot, the action corresponding to the first time step is executed and the robot position is updated using Eq. (2). www.nature.com/scientificreports/ Implementation. The algorithm presented above has been implemented in Matlab and the main implementation features are discussed in what follows. The discrete time step has been set to t = 1.2 s . The time horizon for optimization has been set to T = 4 , that is, 4.8 seconds. Please note that in the main paper we assumed a unitary discrete time step to enhance readability.
As previously stated, each agent can execute actions taken from an action set of finite size. Specifically, in our implementation, each agent has seven possible actions for θ i (t) , which represent possible relative headings to follow within the agent visibility zone. Namely, takes values in the finite set � = {− π/2, − π/3, − π/6, 0, π/6, π/3, π/2} rad . We remark that we limited the cardinality of to seven, pursuing a trade-off between a satisfactory performance and a reasonable computational complexity of the algorithm.
In Eq. (1b), the β parameter is set considering the Hall convention 28 that posits the existence of a personal space of circular shape that ensures comfort conditions for human navigation. The value of β has been estimated through the analysis of the open-source surveillance videos 37,38 .
In Eq. (3), the term ( � obs (θ i ) ) can be neglected if the first estimation of the agent trajectory does not intersect any static obstacle. Otherwise, � obs (θ i ) in Eq. (6) is computed referring to the closest obstacle, toward which the agent is projected to collide. Then, the closest point of such obstacle to the agent position is computed ( p obs ). To reduce the computational load, obstacles are mapped into a discrete spatial map overlapping with the 2D environment. The map consists of a rectangular matrix of 576x720 cells, which are marked as being occupied by an obstacle or free from them. Each cell covers approximately a square of 1.8x1.8 cm.
The weight γ (t) in Eqs. (4) and (5) is selected as a time-varying term that is used to balance the relative importance of terms � goal (θ i ) and � smooth (θ i ) over the optimization horizon T. This choice emerges from the analysis of the available surveillance videos, where we observed that the minimization of the distance to goal typically prevails on the smoothness requirement as long as the agent gets closer to their goal, and vice versa. Considering T = 4 time steps, we chose the following sequence for γ (t) , starting from a generic time instant t * : Trajectories generation for performance parameters. We designed trajectories for a preliminary assessment through performance parameters evaluated in three experimental conditions, which differ for the algorithm governing the motion of a selected agent (i.e., either a robot or a human being): in the condition humans only (HO), all the agents were human beings moving in a real environment; in the condition humans and GT (GT), one of the agents was controlled by our game-theoretical algorithm, while the other agents were human beings; and in the condition humans and VFH (VFH), one of the agents was controlled by the VFH algorithm 32 , while the other were human beings. Each experimental condition comprises seven different experiments (i.e. seven different trajectories), differing for the start and the goal chosen for the selected agent, the quantity of human subjects involved in the interaction, and their motion patterns.
The virtualized environment is constructed by processing movies collected from surveillance cameras of populated environments 37 , obtaining a 2D arena where virtual agents reproduce the human motion captured in the movies. In the HO condition, the performance parameters are evaluated in the original arena, with reference to a randomly selected human being. In the GT and VFH conditions, a virtual agent is introduced in the arena and commanded to use the given trajectory planner (GT or VFH) to navigate through the existing virtual agents, corresponding to human beings.
Survey questionnaire, a-priori power analysis. Survey questionnaire. The proposed methodology is validated using a variation of the Turing test 39 , which evaluates whether the robot behavior, controlled by the game-theoretical method, is comparable to or indistinguishable from human navigation patterns.
The variation of the Turing test consists of an online survey questionnaire composed by three main parts: (i) in the first part, the participants underwent a training phase to familiarize with the working environment (see Fig. 3a www.nature.com/scientificreports/ over a gray background- (Fig. 3c illustrates a frame of a single experiment); (iii) in the third and final part, the participants watched the same 21 videos (but in a different random order), where they were asked in addition to focus on a circled arrow (Fig. 3d illustrates an example of a frame of a single experiment). The participants were unaware that the circled arrow targeted a random human agent in the HO experimental condition and the robotic agent in GT and VFH experimental conditions. We remark that the seven experiments used for the survey questionnaire are identical to those used to evaluate the performance parameters computed in the previous section. The execution of each part entails answering specific questions. In the first part of the survey questionnaire, the participants were required to provide their gender, age, and level of experience in robotics field on a Likert scale 40 from 1 (no experience) to 5 (expert).
During the training part, the participant is guided from a typical urban scenario of Fig. 3a to the particular scenario used in the other parts of the test illustrated in Fig. 3c. The intermediate scenario of Fig. 3b is designed to gradually guide the participant to the final set-up.
In the testing scenario of Fig. 3c, agents (pedestrians and robot) have been replaced with arrows and the urban environment has been removed to prevent the participant from focusing on the scenario, rather than on the movement of agents.
In the second part, the participant watches 21 videos randomly (about 15 seconds each) consisting in three different experimental conditions: 7 videos show an environment with only pedestrians (HO); other 7 videos a scenario with pedestrians and a robot controlled with an algorithm at the state-of-the-art (the Enhanced Vector Field Histogram 32 ) (VFH); and the remaining 7 videos show a scenario with pedestrians and a robot controlled with the proposed algorithm (GT). In all experimental conditions, robot trajectories are re-planned with a frequency of 2 Hz.
To assess the level of social acceptance of our game-theoretical trajectories, in the second part (following habituation), we asked the participants to say if they perceived "weirdness" in the motion observed in the videos, and then to indicate which is the perceived "weird" arrow, if any, as shown in Fig. 3c.
In the third part, participants were requested to determine whether the circled arrow is a human or not. Then, participants were asked to rate the naturalness of the motion of the circled arrow on a Likert scale 40 defined in a range from 1 (completely unnatural) to 5 (completely natural). The test takes about 20 minutes to be completed properly. The test has three rules: (i) the participant cannot pause the video; (ii) the participant can watch videos only once; (iii) the participant should complete the test without interruptions or distractions.
A-priori power analysis. Preliminary, we conducted an a-priori power analysis to estimate the number of participants required to provide acceptable and significant statistical results 41 . To this aim, we used the free software G*Power 42 . First, we identified our case analysis as a non-parametric study, since non-parametric statistical tests pose no constraints or prerequisites on the data distributions 43 . Then, we assumed that the data collected after the a-priori study would be analyzed via the non-parametric Kruskal-Wallis test because our independent variables pertain to more than two independent groups (HO, GT, and VFH) and our dependent variables (the rating of the weirdness motion, human-likeness, and naturalness of movement) are ordinal.
Based on 41 , we computed the total sample size considering the ANOVA test 44 , i.e., the parametric-equivalent test of the Kruskal-Wallis one and then multiplied the result by the corrective factor ARE, obtaining the equivalent sample size of the non-parametric Kruskal-Wallis test. The result of the a-priori analysis for our non-parametric test is about 152 volunteers, considering an alpha level equal to 5% , power of the study 80% and the three groups, corresponding to the three different experimental conditions. We recruited the participants using the Institutional email of Politecnico di Torino and then we distributed an online survey questionnaire to students and university staff. Ultimately, we collected 691 responses, exceeding the sample size of 152.
Statistical analysis. Experimental data (both the generated robotics trajectories and the responses to the survey questionnaire) were preliminary assessed for normality distribution and homoscedasticity of variance (Levene's test). These analyses revealed that data violated the assumptions for parametric statistics. Thus, we decided to adopt a non-parametric test , i.e., Kruskal-Wallis 45 .
First, the quality of the trajectories generated by the two algorithms and the HO condition was evaluated. We first addressed whether they differed in terms of variability of path length ratio, path regularity, and distance to the closest pedestrian through the Levene's tests 46 . Significance level was set at p < 0.05 45 (for all statistical tests performed in this study), and paired post-hoc comparisons have been conducted-adopting a Bonferroni correction-when appropriate. Following these preliminary analyses, trajectory data have been analysed through the non parametric Kruskal-Wallis test.
Survey questionnaire data have also been analysed through Kruskal-Wallis test followed by Bonferroni posthoc analyses 47 . These analyses were aimed at assessing whether participants exhibited a differential appraisal of the different trajectories in terms of weirdness, human likeness, and naturalness. This statistical approach was adopted for all the questions in the second and third part of the survey questionnaire, except for the second question of the second part. In the latter, participants were asked to indicate the perceived "weird" arrow, if any. We posit that more weirdness should be perceived in agents driven by algorithms than in agents associated with human beings. For this reason, the answers expressed relative the HO scenario were not considered, since all arrows corresponded to human beings and an indication of weirdness would not make sense to our research question. As a consequence, in this specific instance, only two experimental conditions had to be compared (GT and VFH) and, to this aim, we used the Mann-Whitney test 48,49 .

Results
In this section, the results of the analysis conducted on the trajectories of the 21 experiments (seven experiments for each of the three experimental conditions) are presented. Then, the results of the survey questionnaire are illustrated and commented.
Analysis of performance parameters. Three widely adopted parameters, deemed as important for socially navigating robots, were evaluated across the three experimental conditions: the Path Length Ratio (PLR), the Path Regularity (PR), and the Closest Pedestrian Distance (CPD) 50 .
The PLR is defined as the ratio between the length of the line-of-sight path between the initial and final point of a path and the actual path length between the same two points 50 . A higher path length ratio is usually preferred, since it indicates that an agent minimizes the length of the path to reach its goal. We computed the PLR for each experiment and we illustrate its average values across the three experimental conditions in Fig. 4a. The results in Fig. 4a suggest that the HO scenario was characterized by the highest average PLR, followed by GT and VFH.
The PR quantifies to what extent a path is similar to a straight line 50 . Upon normalization, PR = 1 corresponds to a straight path from start to goal. Values of PR closer to one are preferable, since they are indicative of a smoother motion, without excessive changes of direction. In Fig. 4b, the average PR for each experimental condition is illustrated, where the highest average value pertains to HO, followed by GT and VFH. These results appear in line with the tenet that humans tend to minimize their energy, thus avoiding sudden changes of orientation, and with the design principle of the VFH algorithm, which avoids obstacles only when the agent is close to them 32 , entailing swift changes of orientation to get away from them.
The CPD is defined as the distance from the closest pedestrian, normalized with respect to the maximum length measurable during experiments, which is the diagonal of the experimental arena. Also for this parameter, the attainment of values closer to one is desirable, as this implies a good tendency in staying clear from humans when following planned trajectories. Average values of CPD in the three experimental conditions are illustrated in Fig. 4c, where the highest average value is related to GT, followed by HO and VFH. The reason for the latter is presumably due to the purely reactive design of the VFH algorithm. We posit that the intermediate ranking of HO with respect to CPD is due to the ability of humans to evaluate situations on a case-by-case basis. www.nature.com/scientificreports/ While the rankings described above are suggestive of superior performance parameters attained by GT over VFH, the verification of the statistical significance of these comparisons is in order.
To preliminarily evaluate the quality of the trajectories generated by the two algorithms and the HO, we first addressed whether they differed in terms of inter-experiment variability of the three performance parameters through the Levene's test 46 . We evaluated the extent to which each algorithm generated trajectories that were similar to one another. Our analysis revealed that there exists a significant differential variability with respect to PR ( F 2,18 = 3.75 , p = 0.043 ). Thus, we performed a post-hoc analysis that revealed much more variability in the VFH videos compared to HO and GT (VFH vs. HO: p = 0.038 ; VFH vs. GT: p = 0.040 ; HO vs. GT: p = 0.97 ); similarly, albeit not statistically significant, we observed a trend toward increased variability in VFH experiments with respect to PLR ( F 2,18 = 3.22 , p = 0.064 ). Finally, the inter-experiment variability within each experimental condition was indistinguishable concerning CPD ( F 2,18 = 2.31 , p = 0.130 ). These results indicate that, albeit indistinguishable in absolute values, the reproducibility and predictability of each experimental condition in terms of PR and PLR was much higher in HO and GT than in VFH scenario.
With the Leven's test described above, we have not only shown that the variances of the 3 experimental conditions in PLR and CPD are equal but we have also shown that for these two parameters the assumptions for executing the Kruskal-Wallis test are satisfied. Thus, in line with this consideration, the null hypothesis for the Kruskal-Wallis is defined as follows: To this aim, Kruskal-Wallis analysis 51 was executed across the two performance parameters, revealing the absence of statistically significant differences ( χ 2 = 2.5 , p = 0.286 for PLR; χ 2 = 0.36 , p = 0.834 for CPD). The reason behind such observations is strictly related to the consideration of only seven experiments for each experimental condition, with differential degree of variability, and thus characterized by a limited statistical power.
For completeness, we executed an a posteriori power study to verify the limited statistical power, and we found that the statistical power considering only seven experiments per group is 6%, thus very limited. Survey questionnaire. We collected 691 responses to the survey questionnaire, where participants were in majority in their thirties with very little experience on robotics ( Table 1). The gender composition was slightly unbalanced toward men. The age range of our sample isu from 18 to 78 years old. www.nature.com/scientificreports/ A power analysis 41 indicated that the adequate statistical power was guaranteed with 152 participants. Since the number of participants largely exceeded the required sample size, we opted for a bootstrapping approach 52 , in which we randomly sampled 152 observations from the complete pool of responses and iterated this process 100 times. Adopting this procedure, we kept the sample size to the appropriate number (thus reducing the odds to obtain biologically irrelevant findings 53 ) and increased the generalizability of our findings by testing their robustness against repeated observations. Experimental outcomes were analyzed with the Kruskal-Wallis test to statistically reject the H 0 hypothesis and understand if there exist differences among experimental conditions.
Our null hypothesis posits: Hypothesis 3 (H 0 ) All experimental conditions (HO, VFH, GT) are perceived by participants as indistinguishable.
In the analysis of the results of the second part, in accordance with our expectations, the VFH condition was characterized by the highest level of weirdness compared to HO and GT conditions, which were in turn indistinguishable from one another (Kruskal-Wallis test χ 2 = 107 ± 13.5 and p < 10 −17 for all bootstrapping iterations; post-hoc analysis: for HO-VFH p < 10 −10 for all bootstrapping iterations, for GT-VFH p < 10 −14 for all bootstrapping iterations, for GT-HO p > 0.05 for 88 bootstrapping iterations out of 100, but the remaining has p > 0.01). Figure 5a illustrates the mean rank (in light of the bootstrapping procedure) in "weirdness" of motion (WM) along with its standard deviation.
Notably, GT and HO are indistinguishable from one another, while VFH is significantly different from GT and HO. Specifically, while VFH was considered "weird" in the majority of instances (61%), GT was considered "weird" much less often than HO videos (33% and 37%, respectively) (See Fig. 6).
We then asked the participants who detected weirdness in the videos to indicate which of the arrows exhibited such weirdness. We posit that more weirdness should be perceived in agents driven by algorithms than in agents associated with human beings. Our experiments indicated that the agent judged as weird was actually associated to a robot only in 16% of GT, while this proportion drastically increased to 47% of VFH (see Fig. 6 patterned bars). This finding, combined with the Mann-Whitney test ( U = 4(10 3 ) ± 489 , p < 10 −20 for all the bootstrapping interation considering the whole bootstrapping analysis), supports the view that the trajectories generated by GT are perceived as much more natural than those generated by VFH. Additionally, it suggests that the motion of the robot controlled by GT is perceived as more human-like than the one generated by VFH.
In the third part, we further delved into the subjective rating of the three motion patterns by asking participants to focus on the motion of a circled target agent and evaluate whether such motion corresponds to a human or not (human likeness), along with its degree of naturalness on a Likert scale from one (minimum naturalness) to five (maximum naturalness). When focusing on the qualitative measurements of the human likeness, we observed that VFH-related arrows were considered much less human-like (41.11%) than both GT (64.59%) and HO (80.31%). Thus, as illustrated in Fig. 5b, VFH is judged as the least human-like ( χ 2 = 142.55 ± 15.12 , p < 10 −22 for all Kruskal-Wallis bootstrapping iterations; post-hoc analysis: p < 10 −5 VFH-GT, p < 10 −30 VFH-HO) which is consistent with the previous part of the test, where VFH is perceived as generating the "weirdest" motion. Additionally, GT-related arrows were considered significantly less human-like compared to HO ( p < 10 −4 post-hoc analysis GT-HO). Figures 5c and 7 illustrate the results related to the naturalness of the circled arrow. Figure 7 shows the result about the average naturalness of motion of the circled arrow on a Likert scale from 1 (minimum naturalness) to 5 (maximum naturalness), computed over the 100 iterations of the bootstrapping procedure.
In accordance with our expectations, HO exhibits the highest mean degree of naturalness (4) with a standard deviation of 0.04, closely followed by GT (3.5) with a standard deviation of 0.04, whereas a larger gap separates VFH (2.6) with a standard deviation of 0.05. Importantly, although significantly different from HO, GT values exceeded three. This may indirectly suggest that while HO videos were deemed natural, also GT videos may have been regarded as human-like. Yet, this proposition is currently speculative whereby the intermediate value (three) was not marked with the anchor "natural". Therefore, future studies are needed to precisely detail the individual appraisal of the naturalness of the GT trajectory.

Discussion
The main goal of our study was to design a navigation system for autonomous robots moving through populated environments, characterized by a high degree of acceptability by humans. Specifically, in light of the increasing use of autonomous robots in real life, we tested whether a navigation system designed through the principles of game theory would generate indistinguishable trajectories from those walked by human beings. To this aim, we first leveraged game theory to develop a model capable of predicting the intention of motion of humans in populated environments and then, based on this model, we devised a trajectory planning algorithm for a mobile robot. Finally, to assess the social acceptance of the generated robotic trajectories, we conducted a survey questionnaire on a statistically robust group of volunteers using a variation of the Turing test. For greater completeness and toward even more robust outcomes, before analyzing the results collected from the survey questionnaire, we also analyzed the geometrical features of the robotic trajectories, generated in the three experimental conditions (HO, GT, and VFH), selecting three performance parameters from the state of the art (PLR, PR, CPD). The ranking obtained through this analysis (HO, GT, VFH) is consistent with the results obtained through the Turing test, except for the closest pedestrian distance (CPD), in which the trajectories generated by our planner (GT) exhibit higher values of the parameter than those measured in environments populated by humans only (HO). We hypothesize that this exception is due to the fact that our model guarantees, by design, a minimum safe distance to pedestrians to prevent collisions and to ensure, in any case, a comfortable action space. On the other hand, humans on the walk are more flexible in this respect, and evaluate circumstances on a case by case basis. While the outcome of the Turing test is consistent with the analysis of the performance parameters of the trajectories, the statistical analysis (Kruskal-Wallis) executed on the latter shows that this finding is not statistically significant. To explain this non-statistically significant result, we point out that the statistical analysis was conducted on only seven experiments per group, with differential degree of variability, and thus characterized by a limited statistical power.
Moreover, to preliminarily evaluate the quality of the trajectories, we conducted a systematic analysis (Levene's test) to assess the degree of variability of the different scenarios. In other terms, we evaluated the extent to which each algorithm generated videos that were similar to one another. This analysis revealed that there exists a significant variability with respect to the path regularity (PR), whereby the videos with the robot controlled by the VFH are the most variable, compared to the HO and GT experimental conditions. This finding suggests that the VFH algorithm is less predictable (i.e., it provides less regular results) than both our algorithm and a real pedestrian.
The variant of the Turing test comprises a first part that functions as a training phase. The second part comprises two consecutive phases. The first phase is devoted to compare the social acceptability of trajectories generated by either our game-theoretical algorithm (GT) or a state of the art algorithm (VFH) against a reference www.nature.com/scientificreports/ experimental condition, a complex social environment populated by humans only (HO). To this aim, participants were asked to say if they perceived weirdness in HO, GT, or VFH experimental conditions. The statistical test confirms that the perceived weirdness in trajectories in which only human subjects are involved is statistically indistinguishable from trajectories where the GT-controlled robot and human subjects coexist. Conversely, the videos in which the trajectories are generated by the VFH algorithm are perceived with a remarkably higher degree of overall weirdness compared with either HO or GT scenarios.
In the second phase of the second part of the test, participants were asked to indicate which is the perceived "weird" arrow, if any. In this regard, we observed that the trajectories generated by the VFH algorithm were more frequently recognized as "weird" than those generated by our GT algorithm.
In the third part of the test, participants were requested to focus on a circled arrow (a human in the HO experimental condition, a robot in GT and VFH ones), and were asked to evaluate whether or not the motion of the circled arrow corresponded to human recordings, and then rate their degree of naturalness. We observed that, while the arrow in VFH scenario was perceived as not human-like, the arrow controlled through GT was considered human-like, albeit not as human-like as the one rated in the HO experimental condition. We believe that this result is related to the fact that, in this part of the test, participants were asked to focus on one arrow only, thus being biased toward detecting an artificial behavior. The same ranking between the three experimental conditions (HO, GT, and VFH) resulted from the analysis of the naturalness of motion of the circled arrow. Indeed, HO has the highest degree of naturalness, closely followed by our GT trajectory planner, and then by the VFH planner.
We can conclude that, if participants were not guided to focus on a particular arrow, they would not distinguish much difference between a real human and a robot controlled through our game-theoretical framework and, therefore, the generated trajectory is a good candidate for social acceptance. This implies that our trajectory planning algorithm would help programming robots to blend well in populated environments, and, hence, to be perceived as more friendly, collaborative, and non-hostile.
Our findings are consistent with other studies in the literature 9 , where a different game-theoretical planner is perceived almost as human-like as human recordings. However, in 9 , the authors created a human-like motion planner for mobile robots, still maintaining a simplified framework that does not comprise, for example, human groups, obstacle avoidance performed by humans, and the human desire of keeping a safe personal space around them 28 . Moreover, their tests only comprise simplified scenarios: a first test with either only humans, or only robots; a second test in which the participant, based on virtual reality, interacts with an agent who can move as a human or a robot. In our study, we went one step further in modeling (including the personal space, the group recognition, and the human-obstacle interaction) but also in the design of the variation of the Turing test (considering a real case scenario in which a robot moves in a human populated environment). Nevertheless, it is hard to make extensive comparisons with other approaches, as the literature on variants of the Turing tests for assessing social acceptability of a robot agent is scant.
Notably, the literature reports three main methods to evaluate the human-likeness and the social acceptance of robot navigation: (i) definition of social rules or performance parameters and, then, assessment of the adherence of the robot motion planner to these principles 54-56 ; (ii) comparison between simulated trajectories and observed pedestrian behavior 57 ; (iii) survey questionnaire based on a variation of the Turing test 9,58 . The main limitation of the first two methods is that they do not consider how humans perceive the robot. However, these methods can be applied to evaluate, as a preliminary test, some features of the generated trajectories. Indeed, our analysis of the performance parameters of the generated trajectories falls within the first methodology, whereas the second methodology has been used as a validation criterion for our game-theoretical model of pedestrian motion.
Hence, toward our aims, we deemed the Turing as an effective means to study the human-likeness and the social acceptability of the generated trajectories.
Unlike the Kretzschmar's 58 and Turnwald's 9 tests, where volunteers watched videos in which the totality of agents moved either in an artificial way or as real pedestrians, our survey questionnaire completely changes such a perspective. In fact, our test videos reproduce a true use case scenario of the algorithm (an environment populated by people with a single robot moving within), where the real nature of agents is masked and made uniform to eliminate any participants' bias. Moreover, unlike Kretzschmar's test 58 , where the Turing test is executed only on 10 participants, we performed an a priori power analysis to infer the correct sample size to obtain statistically significant results. Due to the largely superior size of collected data than the outcome of the power analysis, we carried out a 100-iteration bootstrap, always getting consistent results across iterations, highlighting the robustness of our results and further corroborating our hypothesis.
When interpreting the results of our study, we should also acknowledge the limitations of the model and of the test design. Regarding the former, our model does not take into account the uncertainties that arise from the interaction with the external world. Importantly, the stochasticity of human behavior is not explicitly modeled, although this is implicitly accounted for through tuning model parameters identified from real trajectory data, extracted from surveillance cameras. A range of simplifying assumptions were in order to handle the computational complexity of the algorithm. The main one resides in the discrete nature of our model, whereby each agent can choose between a fixed number of motion directions-an indispensable trade-off between predictive accuracy and computational effort. Moreover, the designed human motion model has been devised to operate with a limited number of pedestrians: its computational complexity may be difficult to manage if the number of agents increases to more than a dozen. The pedestrian model used in this study only considers people's goaldirected and collision-avoiding behaviors, while ignoring other social activities that humans may perform in a pedestrian urban scenario, such as waiting for a bus or wandering without a clear direction. Thus, any pedestrian behavior that is not contemplated by our model breaks the assumptions under which our system works. In addition, our method does not allow customization of trajectories. For example, the prediction of a trajectory walked by an elderly person may be coincident with that of a child. www.nature.com/scientificreports/ The main limitation of the test design is the choice of the navigation algorithm chosen for comparison (VFH). Ideally, more than one algorithm should have been selected in order to mitigate algorithm-induced biases. However, since the execution of the Turing test already took about 20 minutes to the average participant, we preferred to limit our comparison to only one algorithm at the state of the art, in order to avoid increasing the time of the experiment for each participant, mitigate attention biases and, in the end, achieve robust results.
Our work can be extended along several directions. To manage and predict the motion of big crowds, meanfield games could be adopted 59 . We remark, however, that crowded and populated scenarios are different under many aspects, and the deployment of a robot in the two scenarios would cover totally different application fields.
The lack of customization in the inference of trajectories by our model can be mitigated by combining our approach with learning strategies as in 60 , encompassing variegate behaviors across the experimental scenario. In fact, adding variability to the pedestrian model might allow for a more accurate prediction of human motion pattern and should allow the robot to better adapt to the needs of the human with whom it is interacting. For example, if a robot recognizes a person who has difficulties in walking, the robot should be able to predict their movement and possibly reduce its speed. Moreover, it would be interesting to understand and assess the quality of our generated trajectories considering not only social acceptability but also the comfort 11 feeling of participants, for instance by creating a real shared environment with humans and a robot.

Statement about methods.
All methods were carried out in accordance with relevant guidelines and regulations described in the methods section.
Ethical approval and informed consent. The experimental protocol regulating the administration of the Turing test to human subjects, the evaluation of the results, and the data management plan was approved by the ethical committee of the Istituto Superiore di Sanità (Italian Institute of Health) with approval code AOO-ISS 10/07/2020 -0024079, Class: PRE BIO CE 01.00. Each participant also provided informed consent, after the explanation of the nature and possible consequences of the study.

Data and code availability
The code used for the generation of the trajectories and the anonymized data collected during the experiments are available through the following link: https://gitlab.com/PoliToComplexSystemLab/game-theoretic-trajectoryplanning.git.