Article | Published:

# Differential game theory for versatile physical human–robot interaction

## Abstract

The last decades have seen a surge of robots working in contact with humans. However, until now these contact robots have made little use of the opportunities offered by physical interaction and lack a systematic methodology to produce versatile behaviours. Here, we develop an interactive robot controller able to understand the control strategy of the human user and react optimally to their movements. We demonstrate that combining an observer with a differential game theory controller can induce a stable interaction between the two partners, precisely identify each other’s control law, and allow them to successfully perform the task with minimum effort. Simulations and experiments with human subjects demonstrate these properties and illustrate how this controller can induce different representative interaction strategies.

## Main

Traditional robotic manipulators, such as those used in the automotive industry, are potentially dangerous to human workers and thus are kept separated from them. However, robotic systems have been increasingly used in recent decades to work in physical contact with humans. While the potentialities of such contact robots are suggested by the benefits observed during physical tasks carried between humans1,2, they have so far made little use of the opportunities of interactive control3: either they are fully controlled by the operator (as in ‘master–slave’ behaviour mode4,5,6), or, conversely, their control law does not adapt during movement to the interaction with the user, for example in typical rehabilitation robots7. To efficiently assist a human user, a contact robot should arguably react and adapt to their specific motion behaviour according to a desired control strategy.

How should a contact robot be controlled to provide a stable and appropriate response to a user with unknown dynamics during various activities ranging from sport training, to physical rehabilitation and shared driving8,9,10? Specific human–robot interactions have been studied11,12, but a general framework for interactive control is still missing. It has been suggested that differential game theory (GT) can be used as a framework to describe various interactive behaviours between a robot and its human user13. However, GT typically assumes knowledge of the partner’s dynamics and control strategies14,15, while a contact robot cannot a priori know the sensorimotor control of the human user.

On the other hand, the study of physical tasks between humans has revealed how the capability to understand the partner’s sensorimotor control and to adapt one’s own control is key to interaction benefits16. Some recent papers have studied how finite games can deal with limited knowledge of the partner17,18, but it is unclear how the available techniques should be transferred to the continuous control of the interaction with a robot. A method to model the interaction force of a contact robot with its user has been introduced in ref. 19, but in an approach requiring force sensing and restricted to the ‘collaboration’ between equals13. However, different control strategies are required for various tasks, such as ‘less-than-needed assistance’ to promote successful learning in physical rehabilitation20, or a deceptive behaviour needed for ‘competition’ that can be used to challenge a user21.

Therefore, our goal is to develop a versatile interactive motion behaviour for a robot in physical contact with a human user, with which these two agents can optimally react to each other by learning each other’s control. For this purpose, we model the robot’s and human’s task objectives through respective cost functions in a GT framework, which enables us to specify the desired control strategy13. However, the mechanical connection between the partners means that their individual behaviours will interfere and may lead to instability. It is unclear how to ensure stability of the human–robot system and task completion while identifying the human user’s control strategy and adapting the robot’s control. To address these issues for simultaneous co-adaptation and control, we propose using adaptive control22 to estimate the partner’s cost function and GT to compute a Nash equilibrium and resolve possibly conflicting situations where each partner optimizes their own behavior based on the available knowledge23. This adaptive GT controller is validated on reaching arm movements. It does not require measurement of the interaction force, and is proved to yield a stable interaction in typical conditions. The partner’s controller model can be used to specify different types of behaviour, which we illustrate in experiments with human subjects exhibiting less-than-needed assistance required for physical rehabilitation and a competitive behaviour.

## A game theory controller that understands the partner

As human motion planning is carried out essentially in task space24, we describe the control of a contact robot and its human user in the corresponding space {x} of the robot’s end effector, where $$x\in {{{\mathbb{R}}}}^{n}$$. The control can then be transferred to the joint space as described in the Methods. The variables used throughout this Article are listed in Table 1. Let

$$v+{u}_{{{\rm{h}}}}=\Psi (x,\dot{x},{\ddot{x}})$$
(1)

describe how a contact robot (with command $$v\in {{{\mathbb{R}}}}^{n}$$) and a human user (applying a force $${u}_{{{\rm{h}}}}\in {{{\mathbb{R}}}}^{n}$$) move the robot’s dynamics $$\Psi (x,\dot{x},{\ddot{x}})$$. Selecting the mechanical impedance process

$$v\equiv u+{{{\Psi }}}-I{\ddot{x}}-D\dot{x},\,\, I,D\in {{{\mathbb{R}}}}^{n\times n}$$
(2)

to track a common and fixed target $${x}_{\rm{d}}\in {{{\mathbb{R}}}}^{n}$$ (with $${\dot{x}}_{\rm{d}}=0$$), the control equation yields:

$$\begin{array}{l}\dot{\xi }=A\xi +B\left(u+{u}_{{{\rm{h}}}}\right),\,\, \xi \equiv \left[\begin{array}{c}x-{x}_{\rm{d}}\\ \dot{x}\end{array}\right],\\ A\equiv \left[\begin{array}{cc}{0}_{n} & {1}_{n}\\ {0}_{n} & -{I}^{-1}D\end{array}\right],\,\, B\equiv \left[\begin{array}{c}{0}_{n}\\ {I}^{-1}\end{array}\right]\end{array}$$
(3)

where $$u\in {{{\mathbb{R}}}}^{n}$$ will be computed according to linear GT as described below, 0n is the n × n zero matrix, 1n the n × n identity matrix (with 1 as diagonal coefficients and 0 elsewhere), and I and D are the inertia and viscosity matrices specifying the desired interaction dynamics.

Using non-cooperative differential GT15, we can describe the robot’s interaction with the human user (equation (3)) as a game between two players who minimize their respective cost functional

$$\begin{array}{c}U\equiv {\int }_{{t}_{0}}^{\infty }{\xi }^{{{\rm{T}}}}(t)Q\xi (t)+{u}^{{{\rm{T}}}}(t)u(t){\rm{d}}t,\\ {{\widehat{U}}}_{{{\rm{h}}}}\equiv {\int }_{{t}_{0}}^{\infty }{\xi }^{{{\rm{T}}}}(t){\widehat{Q}}_{{{\rm{h}}}}\xi (t)+{{\hat{{{u}}}}}_{{{\rm{h}}}}^{{{\rm{T}}}}(t){{\hat{{{u}}}}}_{{{\rm{h}}}}(t){\rm{d}} t\end{array}$$
(4)

where (·)T represents the transpose operator and $$\hat{(\cdot )}$$ the estimate of (·). These cost functionals mean that the robot knows its cost U and estimates the cost $${{\hat{{{U}}}}}_{{{\rm{h}}}}$$ of the human user, where $${{\hat{{{u}}}}}_{{{\rm{h}}}}$$ is the estimation of their motor command. Each of the constant weight matrices Q and $${\widehat{Q}}_{{{\rm{h}}}}$$ can be positive semi-definite or negative semi-definite. In these equations, each player fulfils the reaching task by minimizing the error to the target while using minimal metabolic cost. Since the state ξ includes two parts, the target error x − xd and the velocity $$\dot{x}$$, Q and $${\widehat{Q}}_{{{\rm{h}}}}$$ include two components corresponding to position regulation and viscosity, respectively. Based on differential GT for linear systems25, the following control inputs of the robot and its modelled human user minimize the cost functionals (equation (4)) in the sense of the Nash equilibrium:

$$u=-L\xi ,\,L\equiv {B}^{{{\rm{T}}}}P$$
(5a)
$${A}_{{{\rm{r}}}}^{{{\rm{T}}}}P+P{A}_{{{\rm{r}}}}+Q-PB{B}^{{{\rm{T}}}}P={0}_{2n},\,{A}_{{{\rm{r}}}}\equiv A-B{\widehat{L}}_{{{\rm{h}}}}$$
(5b)
$${{\hat{{{u}}}}}_{{{\rm{h}}}}=-{\widehat{L}}_{{{\rm{h}}}}\xi ,\,{\widehat{L}}_{{{\rm{h}}}}\equiv {B}^{{{\rm{T}}}}{\widehat{P}}_{{{\rm{h}}}}$$
(5c)
$${A}_{{{\rm{h}}}}^{{{\rm{T}}}}{\widehat{P}}_{{{\rm{h}}}}+{\widehat{P}}_{{{\rm{h}}}}{A}_{{{\rm{h}}}}+{\widehat{Q}}_{{{\rm{h}}}}-{\widehat{P}}_{{{\rm{h}}}}B{B}^{{{\rm{T}}}}{\widehat{P}}_{{{\rm{h}}}}={0}_{2n},\,{A}_{{{\rm{h}}}}\equiv A-BL$$
(5d)

where L ≡ [Le, Lv] (obtained using the solution of the Riccati equation (equations (5a,b))) is the feedback gain of the robot’s control and $${\widehat{L}}_{{{\rm{h}}}}\equiv \left[{L}_{{{\rm{h}}},{{\rm{e}}}},{L}_{{{\rm{h}}},{{\rm{v}}}}\right]$$ (using equations (5c,d)) is the robot’s estimate of the human feedback gain. Note how the robot’s and modelled human’s controls depend on each other through Ar and Ah, characterizing the coupled optimization.

Importantly, the human user’s control gain Lh is not a priori known to the robot. However, it can be estimated from equation (3) without requiring force sensing (that is, uh) for actions along non-gravity-affected directions, while ensuring stability of the closed-loop system. For this purpose, we first express how the estimation $${{\hat{{{u}}}}}_{{{\rm{h}}}}$$ affects the state ξ to $$\widehat{\xi }$$:

$$\dot{\hat{\xi }}=A\widehat{\xi }+B\left(u+{{\hat{{{u}}}}}_{{{\rm{h}}}}\right)-\Gamma \widetilde{\xi },\,\widetilde{\xi }\equiv \widehat{\xi }-\xi.$$
(6)

We use this equation as an observer for uh, where Γ is a matrix to make Γ − A positive definite, $$\widehat{\xi }$$ the estimate of ξ and $$\widetilde{\xi }$$ the state estimation error. Subtracting equation (3) from equation (6) then yields

$$\dot{\widetilde{\xi }}=\left(A-\Gamma \right)\widetilde{\xi }-B{\widetilde{L}}_{{{\rm{h}}}}\xi ,\,{\widetilde{L}}_{{{\rm{h}}}}\equiv {\widehat{L}}_{{{\rm{h}}}}-{L}_{{{\rm{h}}}}.$$
(7)

As the state estimation error $$\widetilde{\xi }$$ is due to the error $${\widetilde{L}}_{{{\rm{h}}}}$$, we develop an update law for $${\widehat{P}}_{{{\rm{h}}}}$$ (thus for $${\widehat{L}}_{{{\rm{h}}}}\equiv {B}^{{{\rm{T}}}}{\widehat{P}}_{{{\rm{h}}}}$$) to minimize $$\widetilde{\xi }$$. For this purpose, we consider the Lyapunov function candidate

$$V\equiv \frac{1}{2}{\xi }^{{{\rm{T}}}}\xi +\frac{1}{2}{\widetilde{\xi }}^{{{\rm{T}}}}\widetilde{\xi }+\frac{1}{2\alpha }{{\rm{tr}}}\left({\widetilde{L}}_{{{\rm{h}}}}^{{{\rm{T}}}}{\widetilde{L}}_{{{\rm{h}}}}\right),\,\alpha > 0$$
(8)

where tr(M) is the trace of matrix M. Its time derivative yields

$$\begin{array}{c}\dot{V}={\xi }^{{{\rm{T}}}}\dot{\xi }+{\widetilde{\xi }}^{{{\rm{T}}}}\dot{\widetilde{\xi }}+\frac{1}{\alpha }{{\rm{tr}}}\left({\widetilde{L}}_{{{\rm{h}}}}^{{{\rm{T}}}}{\dot{\widetilde{L}}}_{{{\rm{h}}}}\right)\\ =-{\xi }^{{{\rm{T}}}}\left(BL+B{L}_{{{\rm{h}}}}-A\right)\xi +{\widetilde{\xi }}^{{{\rm{T}}}}\left[\left(A-\Gamma \right)\widetilde{\xi }-B{\widetilde{L}}_{{{\rm{h}}}}\xi \right]+\frac{1}{\alpha }{{\rm{tr}}}\left({\widetilde{L}}_{{{\rm{h}}}}^{{{\rm{T}}}}{\dot{\widetilde{L}}}_{{{\rm{h}}}}\right)\\ \equiv -{\xi }^{{{\rm{T}}}}\left(BL+B{L}_{{{\rm{h}}}}-A\right)\xi -{\widetilde{\xi }}^{{{\rm{T}}}}{\Gamma }_{0}\widetilde{\xi }-{\widetilde{\xi }}^{{{\rm{T}}}}B{\widetilde{L}}_{{{\rm{h}}}}\xi +\frac{1}{\alpha }{{\rm{tr}}}\left({\widetilde{L}}_{{{\rm{h}}}}^{{{\rm{T}}}}{B}^{{{\rm{T}}}}{\dot{\widetilde{P}}}_{{{\rm{h}}}}\right)\\ =-{\xi }^{{{\rm{T}}}}\left(BL+B{\widehat{L}}_{{{\rm{h}}}}-A\right)\xi -{\widetilde{\xi }}^{{{\rm{T}}}}{\Gamma }_{0}\widetilde{\xi }-{\left(-\xi +\widetilde{\xi }\right)}^{{{\rm{T}}}}B{\widetilde{L}}_{{{\rm{h}}}}\xi +\frac{1}{\alpha }{{\rm{tr}}}\left({\widetilde{L}}_{{{\rm{h}}}}^{{{\rm{T}}}}{B}^{{{\rm{T}}}}{\dot{\widetilde{P}}}_{{{\rm{h}}}}\right)\end{array}$$
(9)

for some positive definite matrix Γ0 ≡ Γ − A, where we have used the system dynamics (equation (3)) and estimation error dynamics (equation (7)). Setting

$${\dot{\widehat{P}}}_{{{\rm{h}}}}\equiv \alpha \left(\widetilde{\xi }-\xi \right){\xi }^{{{\rm{T}}}},\,\alpha > 0$$
(10)

the last two terms of equation (9) cancel, yielding a non-positive $$\dot{V}$$, while $${\widehat{Q}}_{{{\rm{h}}}}$$ can be computed from $${\widehat{P}}_{{{\rm{h}}}}$$ using equation (5d). Equation (10) is critical as it enables identifying the control of the partner in order to adapt one’s own control using equation (5). The above derivation leads to the following theorem (demonstrated in the Methods) that summarizes our results on co-adaptation and simultaneous control:

Considering the system of equation (3), if the robot and human estimate the partner’s controller and develop their own control as in equations (5), (6) and (10), then:

• the closed-loop system is stable and u, uh, ξ are bounded

• the partners’ controllers and cost functions converge to the correct values if ξ is persistently exciting

• the Nash equilibrium is achieved, that is, the cost functions in equation (4) will increase if either of the control inputs differ from those in equation (5)

Importantly, the interactive controller with co-adaptation and partner’s identification described in equations (5), (6) and (10) can be used to implement representative interaction control behaviours such as those identified in ref. 13. ‘Co-activity’, when two agents ignore each other’s dynamics, is implemented using for their control the respective part of equations (4) and (5) with $${\widehat{P}}_{{{\rm{h}}}}\equiv 0$$ and $$\widehat{P}\equiv 0$$, where $$\widehat{P}$$ is the human’s estimate of P, yielding two independent linear quadratic regulators (LQR). Collaboration between two partners without hierarchy arises when they contribute to the dynamics (equation (3)) using each controller of equation (5) with positive definite {Q, Qh}. Competition can be implemented with a negative definite gain matrix Q for the robot in order to challenge the human user achieving their goal by pushing them away from it, which will promote active learning20. ‘Cooperation’ arises when the two agents take complementary roles, which can be defined through the sharing rule

$$Q+{Q}_{{{\rm{h}}}}\equiv C$$
(11)

enabling the GT controller to continuously modify the contributions between the partners26. Different from collaboration, cooperation fixes the task performance through the total weight C, and uses equation (11) to share the effort. A special case of cooperation, ‘assistance’, arises when C is set to let the robot fulfil the task alone without interaction with a human user. As humans tend to relax during motor actions27,28, the robot will gradually take over the whole task load while the human will become passive. Another special case of cooperation particularly important in physical training is assist-less-than-needed7,20, designed to keep a human trainee engaged during sport practice or physical rehabilitation. This interaction control mode is implemented by setting C to make the robot short of reaching the target alone, which will induce the human to increase their effort in order to reach the target.

A pseudo-code summarizing the steps of the proposed algorithm is

Input: Current state ξ, target xd.

Output: Robot’s control input u, estimated human’s cost function weight $$\hat Q_h$$ in equation (5d).

begin

Define the target position xd, initialize Q, $$\hat Q_h$$, u, $$\hat u_h$$, $$\hat \xi$$, $$\hat P_h$$, set the parameters $$\Gamma$$ in equation (6), α in equation (10), and the terminal time tf of one trial. In the case of cooperation set also C.

while t < tf do

• Measure the position x and velocity $$\dot x$$, and form the state ξ.

• Calculate the estimation error $$\tilde \xi$$ and the estimated state $$\hat \xi$$ using equation (6): $$\tilde \xi \equiv \hat \xi - \xi ,\,\dot{\hat{\xi}} = A\hat \xi + B(u + \hat u_h) - \Gamma \tilde \xi$$.

• Update the matrix $$\hat P_h$$ using equation (10): $$\dot{\hat P}_h \equiv \alpha (\tilde \xi - \xi )\xi ^T$$. Then compute the estimated human’s control gain $$\hat L_h \equiv B^T\hat P_h$$ and motor command $$\hat u_h \equiv - \hat L_h\xi$$.

• Solve the Riccati equation in equation (5b) to yield P, and compute the robot’s control input $$u = - B^TP\xi$$.

• Calculate the estimated human’s cost function weight $$\hat Q_h$$ according to the Riccati equation in equation (5d).

In the case of cooperation, use equation (11) to adapt the corresponding cost function weight $$Q = C - \hat Q_h$$.

## Simulations and experiments

To test the interactive control and co-adaptation of the two agents, we simulated a neurorehabilitation scenario of arm reaching movements back and forth between −10 cm and +10 cm (Fig. 1). The robot dynamics were simulated using equations (5), (6) and (10) to generate u, the simulated human dynamics used a similar set of equations to generate uh, and both of these inputs were used to drive the task dynamics according to equation (3). Both agents were assumed to have no initial knowledge of the partner’s control (thus, initially $${\widehat{P}}_{{{\rm{h}}}}\equiv 0$$ and $$\widehat{P}\equiv 0$$) while equations (6) and (10) were used to estimate it. Figure 1a shows that the collaborative GT controller yields a stable behaviour fulfilling the reaching task. With this controller the human control gain was identified with an error smaller than 5% of the real value in 10 s.

Could the reaching task also be fulfilled using a more simple control law neglecting the partner’s control? We addressed this question by setting $${\widehat{P}}_{{{\rm{h}}}}=0$$ and $$\widehat{P}=0$$ in equation (5) for the simulated robot and human agents, respectively. This leads to the simplified solution of two agents with independent LQR ignoring the interaction with the partner. Simulation of the reaching task, displayed in the dashed blue lines of Fig. 1a, suggests that while the task is generally fulfilled with the independent LQR, this would, however, require larger control gains Lh, thus a larger effort compared with the GT controller. Both the LQR gains and the GT gains depend on the predefined cost functionals. To investigate the effect of the costs systematically, we examined the gains in LQR and GT when Qh varied from Q/10 to 10Q. We see in Fig. 1b that the LQR gains are always larger than the GT gains, with the difference becoming smaller when Qh increases, as the robot’s relative influence decreases. The GT controller considers the interaction with the simulated human (through equation (5b)) and computes the minimal force for the robot to fulfil the task with them. In turn, the simulated human can also minimize their effort.

Is the partner’s control identification really necessary, or could any approximate partner’s model be used instead? To test this question we carried out reaching simulations with a biased identification. Figure 1c,d show that a bias in the partner’s velocity estimation can bring instability to the system and prevent it from reaching the target. The theorem’s demonstration in the Methods shows that both the bias level leading to instability and the relative effect of position and velocity gains on stability depend on the system matrix A and on the cost functionals. Using these cost functionals with the values set in the Methods, both position and velocity gains are positive regardless of the estimation bias. However, the system matrix A is unstable due to damping. Therefore, the closed-loop system will become unstable if the velocity gain cannot compensate for the damping due to estimation of velocity gain with a bias larger than 50 N m–1. Figure 1c,d illustrate the system’s behaviour in the stable case of a 30 N m–1 bias and in the unstable case of a 50 N m–1 bias, respectively. These results illustrate the importance of accurately identifying the partner’s control based on a sound analysis of the dynamics involved, as was developed in equations (6)–(10).

To test the ability of the controller to adapt to any partner’s motor condition, we simulated the scenario of an individual with motor weakness who uses the robot to help recover their motor function (Fig. 2a–c). We assumed that the simulated human improves their arm control by increasing the weight Qh trial after trial as described in the Methods. Robot interaction was adapted after each trial based on the estimated weight of the simulated human’s cost function. To promote active learning while still assisting a weak user, the simulated robot used an assist-less-than-needed cooperation strategy, with equation (11) and C making the robot short of reaching the target alone. We see that while the reaching task is completed with similar performance in all trials (Fig. 2c), this is achieved with different contributions of the simulated human and robot (Fig. 2a,b). Specifically, Fig. 2a,b shows how the robot updates its estimate of the simulated human motor control weight Qh and reduces its own weight Q correspondingly.

Interestingly, it appears in Fig. 2b that the robot’s weight in the fifth iteration decreases to a negative value, that is, the robot provides resistance (instead of assistance) to the simulated human. This form of competition is desirable as it will let the user try to improve performance and be engaged in the physical training. However, the robot will not force the trainee beyond their motor capabilities, as it will keep compensating for the missing dynamics through equation (11). In turn, this illustrates the capability of the adaptive GT controller to induce various types of interactive behaviour, namely here, less-than-needed assistance and competition. The simulation results also illustrate that the two components in the weight matrices for reaching and viscosity (that may correspond to spasticity in a neurologically impaired user29) can be modulated separately, which makes it possible to implement requirements in various conditions.

Considering the scenario of an impaired user such as a stroke survivor using the robot to train their motor control, their behaviour may be erratic with irregular progresses and setbacks over consecutive training sessions. To test the capability of our interactive controller to deal with such erratic behaviours, we simulated a human user with performance varying trial after trial, which is implemented by random changes in the corresponding cost function. Figure 2d shows that even in this situation the robot can adapt to the changing performance by updating its own cost function weight in a suitable way trial after trial.

How will our designed interaction behaviour and adaptation work with real human users? We tested this by implementing less-than-needed assistance control during arm reaching (on the robotic interface shown in Fig. 3a) as illustrated in Fig. 3b. The red traces in Fig. 3c illustrate the robot’s reaching behaviour without interaction with a user. We can observe a nearly constant motion in consecutive trials with estimated weight of the human user at ≈0, as is expected. The movement is short of 3 cm from the target due to the low C parameter set in the controller via equation (11). The pink traces exhibit the adaptation taking place in the healthy subject scenario where the robot assists a representative user instructed to complete the reaching task as best as possible. We observe that the reaching task is effectively carried out, while the robot successfully identifies the increasing contribution of the human user and lowers its assistance correspondingly. Interestingly, the jumps in the weights at the intervals when the movement direction changes stimulate the convergence of the estimated values to the real values. When the subject is instructed to keep the arm passive, in a weak arm scenario, the robot observes this behaviour with a low estimated weight Qh and increases its motor weight Q correspondingly.

To test whether the behaviour with our GT controller depends on subject-specific parameters, we repeated the same experiment with ten healthy subjects, who carried out five trials with three back-and-forth reaching movements, in each of the healthy and weak arm conditions. Figure 3d shows that the controller does not prevent natural motion variability in consecutive trials, which may facilitate learning20,24. For each subject, the Qh,e value converged to reliable values, yielding values with healthy > no interaction (t-test, max{pi} < 0.009, i = 1…10) and with weak arm < no interaction (t-test, max{pi} < 0.012, i = 1…10). This demonstrates that our algorithm is able to clearly identify the different behaviours of each subject and adapt the GT control correspondingly. This is further illustrated in Fig. 3e, where we asked a representative subject to frequently vary his behaviour, as can be expected in neurologically impaired individuals. The result shows that the controller is able to catch these behavioural changes in a suitable way. Altogether, these results demonstrate the efficiency of the simple regulation of assistance during cooperation of the human and robot through the sharing rule (equation (11)) with the GT controller (of equations (5), (6) and (10)).

## Discussion

Contact robots that physically interact with humans are being increasingly used in industry and for physical training, but they lack a systematic methodology to produce versatile interactive behaviours. Typical rehabilitation robots to help limb movement training7, or intelligent industrial systems to support heavy objects against gravity and facilitate their manipulation30, use a controller independent of the human user. This co-activity type of interaction strategy, which does not require the robot to observe the human user behaviour, works only as long as the robot’s task corresponds to the user’s task. Conversely, in master–slave control the robot fully follows the user’s movement. Master–slave control has been used for teleoperated robots, in force extender exoskeletons31, for redirecting movements to admissible areas in Northwestern University’s cobots32, or for robot-assisted mirror physical therapy where the unaffected arm of an hemiplegic patient drives the impaired arm4,33. These two simple interaction behaviours, where the robot either ignores the human operator control (in co-activity) or blindly follows it (in master–slave control) have led to successful systems for specific tasks. However, the example of sophisticated interaction strategies between humans promise a more versatile and flexible interaction optimally combining the two partners’ capabilities1,2,16. Such strategies require an efficient process to understand the partner’s control, which has not been developed in previous interactive controllers.

Does the success of this implementation require that the humans’ motor control corresponds to GT? It is still unknown whether the human central nervous system behaves as predicted by GT to physically interact with other humans, although there is evidence that it considers the partner’s sensorimotor control during common task performance16 and may carry out behavioural decisions that conform to GT35. However, irrespective of whether humans’ sensorimotor control corresponds to GT, the controller of equations (5), (6) and (10) will induce both a stable interaction with the human user (as it adapts its control gain to compensate for the human bounded but possibly unstable control gain), and an interactive behaviour corresponding to a desired control strategy. In turn, the algorithm introduced here may be used to infer how human motor control corresponds to GT, and to determine which characteristic behaviours are adopted during various tasks.

The simulation and implementation results demonstrate that the new adaptive GT controller can flexibly implement different representative behaviours of physical interaction between two agents as proposed in refs. 1,13. These include the collaboration or competition between partners without a hierarchy, the cooperation of complementary partners, as well as assist-less-than-needed assistance promoting engagement and active training. Therefore, this work achieved the programme of ref. 13, by explicating the critical partner’s ‘action understanding’ necessary for an efficient interactive control and by developing the versatile differential GT control algorithm to implement these behaviours. In contrast, the GT controller presented in ref. 19 could only implement a collaboration and required force sensing.

The algorithm for the simultaneous partner’s control identification and co-adaptation, which was developed for a reaching movement here, may be generalized to more complex movements such as tracking16. It can be used to regulate the interaction between any two autonomous agents, such as two robots, or to design human–robot interaction as was demonstrated in this Article. It may further serve as a tool for modelling bimanual control with brain interhemispherical communication in both healthy individuals and neurologically impaired (for example, cerebral palsy) subjects, as well as for modelling the control of physical interaction in a human pair.

The experiments reported here illustrate that the presented algorithm can also be directly used as a dynamic environment for training impaired reaching behaviour, for example after a stroke. Interesting properties of the adaptive controller for training arm movements include the dynamic interaction not preventing natural motion variability in consecutive trials, as well as the stable and less-than-needed motion assistance, from nearly complete motion guidance to competition; both of these factors are critical for successful neurorehabilitation36,37. The experiment demonstrated the adaptability of the robot control, yielding a challenging but supportive training environment and inducing active inconspicuous adaptation.

## Methods

### Joint space control

The rigid body dynamics of many serial mechanisms can be represented through an equation of the form38

$$\tau =H(q)\ddot{q}+N(q,\dot{q})\dot{q}+G(q)$$
(12)

where q is the joint space vector, H the mass matrix, $$N\dot{q}$$ the Coriolis and centrifugal forces (which can be completed with other velocity-dependent forces such as damping), and G the gravity term. This equation can be transformed to a position-dependent force $${\mathit{\Psi}} (x,\dot{x},{\ddot{x}})$$ using kinematic transformations

$$\dot{x}\equiv J(q)\dot{q},\,x=\int \dot{x}{\rm{d}}t,\,{\ddot{x}}=\dot{J}\dot{q}+J\ddot{q},\,\tau ={J}^{{{\rm{T}}}}F$$
(13)

where x is the robot end effector pose and F the endpoint wrench. For a redundant mechanism, the pseudo-inverse $${J}^{\dagger }={J}^{{{\rm{T}}}}{\left(J{J}^{{{\rm{T}}}}\right)}^{-1}$$ of the Jacobian J can replace J−1 to compute the robot endpoint force, as well as its joints angle, angular velocity and acceleration. In the case of a parallel mechanism, the dynamics will generally already be in a form $${\mathit{\Psi}} (x,\dot{x},{\ddot{x}})$$39.

### Theorem demonstration

Substituting update law equation (10) into equation (9) yields

$$\dot{V}=-{\xi }^{{{\rm{T}}}}\left(BL+B{\widehat{L}}_{{{\rm{h}}}}-A\right)\xi -{\widetilde{\xi }}^{{{\rm{T}}}}{\Gamma }_{0}\,\widetilde{\xi }$$
(14)

$$BL+B{\widehat{L}}_{{{\rm{h}}}}-A$$ is positive definite if Q is positive definite, according to equation (5b). As Γ0 is also positive definite, it follows $${{{\rm{lim}}}}_{t\to \infty }\left\Vert \xi (t)\right\Vert =0$$ and $${{{\rm{lim}}}}_{t\to \infty }\left\Vert \widetilde{\xi }(t)\right\Vert =0$$. Therefore, ξ is bounded and $${{{\rm{lim}}}}_{t\to \infty }\left\Vert \dot{\widetilde{\xi }}(t)\right\Vert =0$$ if $${{{\rm{lim}}}}_{t\to \infty }\left\Vert \dot{\widetilde{\xi }}(t)\right\Vert$$ exists. According to the estimation error dynamics (equation (7)) we thus have $${{{\rm{lim}}}}_{t\to \infty }{\widetilde{L}}_{{{\rm{h}}}}(t)\xi (t)={0}_{n\times 1}$$. If ξ is persistently exciting, that is, there exist positive constants t0, T0 and β with $${\int }_{t}^{t+{T}_{0}}\xi (s){\xi }^{{{\rm{T}}}}(s){{\rm{d}}}s\ge \beta {1}_{2n}\forall t\ge {t}_{0}$$, it follows $${{{\rm{lim}}}}_{t\to \infty }\left\Vert {\widetilde{L}}_{h}(t)\right\Vert =0$$, thus $${\widetilde{L}}_{{{\rm{h}}}}(t)$$ is bounded. Lh is assumed to be bounded since it is the human’s control gain. Therefore, $${\widehat{L}}_{{{\rm{h}}}}$$ is also bounded. According to equation (5b), Ar is bounded, thus so are P, L and u. The boundedness of uh follows from the boundedness of Lh and ξ.

Using equation (5d), we can compute the estimation error $${\widetilde{Q}}_{{{\rm{h}}}}={\widehat{Q}}_{{{\rm{h}}}}-{Q}_{{{\rm{h}}}}$$ that is due to the errors $${\widetilde{P}}_{{{\rm{h}}}}$$ and $$\widetilde{P}$$. As both of these errors converge to zero with the human and robot estimations, we have $${\widetilde{Q}}_{{{\rm{h}}}}\to 0$$, that is, the human’s unknown cost function weight Qh is identified by the robot. It can be shown similarly that the robot’s cost function weight Q is identified by the human.

Multiplying the robot equation in equation (5b) by $${\widehat{\xi }}^{{{\rm{T}}}}$$ on the left side and by $$\widehat{\xi }$$ on the right side, and considering the estimation error dynamics equation (7) yields

$$0={\widehat{\xi }}^{{{\rm{T}}}}Q\widehat{\xi }+{\widehat{\xi }}^{{{\rm{T}}}}PB{B}^{{{\rm{T}}}}P\widehat{\xi }+2{\widehat{\xi }}^{{{\rm{T}}}}P\left(\dot{\widehat{\xi }}+\Gamma \widetilde{\xi }\right)\equiv \widehat{\omega }$$
(15)

Taking the limits $$\widetilde{\xi }\to {0}_{2n\times 1}$$ and $$\dot{\widetilde{\xi }}\to {0}_{2n\times 1}$$ yields

$$\omega \equiv {\xi }^{{{\rm{T}}}}Q\xi +{\xi }^{{{\rm{T}}}}PB{B}^{{{\rm{T}}}}P\xi +2{\xi }^{{{\rm{T}}}}P\dot{\xi }\to 0$$
(16)

It can be shown similarly that

$${\omega }_{{{\rm{h}}}}\equiv {\xi }^{{{\rm{T}}}}{Q}_{{{\rm{h}}}}\xi +{\xi }^{{{\rm{T}}}}{P}_{{{\rm{h}}}}B{B}^{{{\rm{T}}}}{P}_{{{\rm{h}}}}\xi +2{\xi }^{{{\rm{T}}}}{P}_{{{\rm{h}}}}\dot{\xi }\to 0$$
(17)

as $${\widetilde{\xi }}_{{{\rm{h}}}}\to {0}_{2n\times 1}$$ and $${\dot{\widetilde{\xi }}}_{{{\rm{h}}}}\to {0}_{2n\times 1}$$, where $${\widetilde{\xi }}_{{{\rm{h}}}}={\widehat{\xi }}_{{{\rm{h}}}}-\xi$$ is the state estimation error of the human. ωh → 0 and ω → 0 indicate that the Nash equilibrium is achieved for the human–robot interaction system25.

### Simulations

Ten back-and-forth reaching movements were simulated between −10 cm and +10 cm with a total time of 40 s. The simulation parameters were set as: inertia I = 6 kg, damping D = −0.2 N m–1, observer gain Γ ≡ diag(10, 1), learning rate α ≡ 104. The collaborative interaction of Fig. 1 was controlled using Q = Qh ≡ diag(100, 0). Biased identification was simulated by adding [0, 30] Ns m–1 and [0, 50] Ns m–1 offsets to the estimated robot and human velocity gains, respectively.

Motor recovery corresponding to the simulation shown in Fig. 2a–c was implemented as an iterative increase of the weight in the human’s cost function with Qh ≡ diag(10, 0) + i diag(60, 0), where i is the trial number, while C ≡ diag(200, 0) was set for the robot to fulfil the reaching task alone. The human’s erratic behaviour shown in Fig. 2d was simulated by using a uniformly distributed random number ρ in the weight of the human’s cost function with Qh ≡ diag(200, 0) + ρ diag(−100, 0).

### Experiment

#### Set-up

The validation with (healthy) human subjects was carried out using the H-Man, a cable-actuated 2-DOF manipulandum40. The version fabricated at Imperial College has a rectangular 24 × 28 cm2 workspace and visual feedback co-located with planar arm movements provided on the screen (Fig. 3a,b). The control of the two Maxon motors was carried out at 1,000 Hz using real-time LabVIEW while the (x, y) position was computed from the motors’ optical encoders.

The reaching task included three back-and-forth reaching movements between −10 cm and +10 cm with a total time of 12 s, which was long enough for the robot to identify the human user’s control. This experiment was approved by the ethics committee at Imperial College (no. 16IC3580). The actual HMan robot dynamics $${{\mathit{\Psi }}}\equiv I{\ddot{x}}+D\dot{x}$$ were used as the desired dynamics. The following control parameters were used throughout the experiment: desired task performance weight C ≡ 4,000, observer gain Γ ≡ 30, learning rate α ≡ 2 × 104, and initial parameter $${\widehat{P}}_{{{\rm{h}}}}\equiv 0$$. C was tuned so that the robot without interaction comes within 3 cm from the target.

#### Protocol

Ten naive subjects without known sensorimotor impairment participated in the experiment, after having given their informed consent. Each subject was instructed to sit in front of the robotic interface facing the monitor and move the handle by following ‘move forward’ and ‘backwards’ instructions displayed on the monitor. Subjects were allowed to practice until they became familiar with the reaching task and the robotic interface. Subsequently, each subject performed five trials (with one trial consisting of three back-and-forth reaching movements) in each of the two experimental conditions ‘healthy’ and ‘weak arm’, in which they were instructed to reach the target and to completely relax the arm, respectively. The control condition ‘no interaction’ was run by the robot alone for 50 times.

#### Data analysis

For each subject and each condition of healthy or weak arm, a t-test was applied on the last value of Qh,e of the five trials to determine the difference from the no interaction condition. The fourth trial of subject 1 had to be discarded due to a recording error as the subject started that trial too early, so there were four values in this case. For each subject, the difference was significant with p < 1.2% and we report the maximum p value over the subjects.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Code availability

The code that supports the findings of this study is available from the corresponding authors upon reasonable request.

## Data availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Consent to publish identifiable images of research participants was obtained.

We have complied with all relevant ethical regulations.

Guidelines for study procedures were provided by Imperial College London.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Sawers, A. & Ting, L. H. Perspectives on human-human sensorimotor interactions for the design of rehabilitation robots. J. Neuroeng. Rehabil. 11, 142 (2014).

2. 2.

Ganesh, G. et al. Two is better than one: physical interactions improve motor performance in humans. Sci. Rep. 4, 3824 (2014).

3. 3.

Jarrassé, N., Sanguineti, V. & Burdet, E. Slaves no longer: review on role assignment for human-robot joint motor action. Adapt. Behav. 22, 70–82 (2014).

4. 4.

Hesse, S. et al. Computerized arm training improves the motor control of the severely affected arm after stroke: a single-blinded randomized trial in two centers. Stroke 36, 1960–1966 (2005).

5. 5.

Hokayem, P. F. & Spong, M. W. Bilateral teleoperation: an historical survey. Automatica 42, 2035–2057 (2006).

6. 6.

Passenberg, C., Peer, A. & Buss, M. A survey of environment-, operator-, and task-adapted controllers for teleoperation systems. Mechatronics 20, 787–801 (2010).

7. 7.

Colombo, R. & Sanguineti, V. in Rehabilitation Robotics (eds Colombo, R. & Sanguineti, V.) 63–74 (Elsevier, 2018).

8. 8.

Marchal-Crespo, L. et al. The effect of haptic guidance and visual feedback on learning a complex tennis task. Exp. Brain Res. 231, 277–291 (2013).

9. 9.

Díaz, I., Gil, J. J. & Sánchez, E. Lower-limb robotic rehabilitation: literature review and challenges. J. Robot. 2011, 1–11 (2011).

10. 10.

Na, X. & Cole, D. J. Linear quadratic game and noncooperative predictive methods for potential application to modelling driver-AFS interactive steering control. Veh. Sys. Dyn. 51, 165–198 (2013).

11. 11.

Music, S. & Hirche, S. Control sharing in human-robot team interaction. Annu. Rev. Control 44, 342–354 (2017).

12. 12.

Khoramshahi, M. & Billard, A. A dynamical system approach to task-adaptation in physical human–robot interaction. Auton. Robot. https://doi.org/10.1007/s10514-018-9764-z (2018).

13. 13.

Jarrassé, N., Charalambous, T. & Burdet, E. A framework to describe, analyze and generate interactive motor behaviors. PLoS ONE 7, e49945 (2012).

14. 14.

Starr, A. W. & Ho, Y.-C. Nonzero-sum differential games. J. Optim. Theory Appl. 3, 184–206 (1969).

15. 15.

Basar, T. & Olsder, G. J. Dynamic Noncooperative Game Theory 2nd edn (Society for Industrial and Applied Mathematics, Philadelphia, 1999).

16. 16.

Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M. & Burdet, E. Physically interacting individuals estimate their partner’s movement goal to enhance motor abilities. Nat. Hum. Behav. 1, 0054 (2017).

17. 17.

Kiumarsi, B. et al. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neur. Netw. Learn. Syst. 29, 2042–2062 (2018).

18. 18.

Marden, J. R., Arslan, G. & Shamma, J. S. Joint strategy fictitious play with inertia for potential games. IEEE Trans. Autom. Contr. 54, 208–220 (2009).

19. 19.

Li, Y., Tee, K. P., Yan, R., Chan, W. L. & Wu, Y. A framework of human-robot coordination based on game theory and policy iteration. IEEE Trans. Robot. 32, 1408–1418 (2016).

20. 20.

Reinkensmeyer, D. J. et al. Computational neurorehabilitation: modeling plasticity and learning to predict recoverys. J. Neuroeng. Rehabil. 13, 1–25 (2016).

21. 21.

Nierhoff, T., Leibrandt, K., Lorenz, T. & Hirche, S. Robotic billiards: understanding humans in order to counter them. IEEE Trans. Cybern. 46, 1889–1899 (2016).

22. 22.

Slotine, J.-J. E. & Li, W. Applied Nonlinear Control (Prentice-Hall, Upper Saddle River, 1991).

23. 23.

Gajic, Z. & Qureshi, M. T. J. Lyapunov Matrix Equation in System Stability and Control (Elsevier, Amsterdam, 1995).

24. 24.

Burdet, E., Franklin, D. W. & Milner, T. E. Human Robotics: Neuromechanics and Motor Control (MIT Press, Cambridge, MA, 2013).

25. 25.

Engwerda, J. Algorithms for computing Nash equilibria in deterministic LQ games. Comput. Manag. Sci. 4, 113–140 (2007).

26. 26.

Evrard, P. & Kheddar, A. Homotopy switching model for dyad haptic interaction in physical collaborative tasks. In Proc. IEEE Worldhaptics 45–50 (2009).

27. 27.

Emken, J. L., Benitez, R., Sideris, A., Bobrow, J. E. & Reinkensmeyer, D. J. Motor adaptation as a greedy optimization of error and effort. J. Neurophysiol. 97, 3997–4006 (2007).

28. 28.

Franklin, D. W. et al. CNS learns stable, accurate, and efficient movements using a simple algorithm. J. Neurosci. 28, 11165–11173 (2008).

29. 29.

Levin, M. F. et al. Deficits in the coordination of agonist and antagonist muscles in stroke patients: implications for normal motor control. Brain Res. 853, 352–369 (2000).

30. 30.

Colgate, J. E. et al. Methods and apparatus for manipulation of heavy payloads with intelligent assist devices. US patent 7185774 (2007).

31. 31.

Zoss, A. B., Kazerooni, H. & Chu, A. Biomechanical design of the Berkeley lower extremity exoskeleton (BLEEX). IEEE-ASME Trans. Mech. 11, 128–138 (2006).

32. 32.

Peshkin, M. A. et al. Cobot architecture. IEEE Trans. Robot. Autom. 17, 377–390 (2001).

33. 33.

Burgar, C. G. et al. Robot-assisted upper-limb therapy in acute rehabilitation setting following stroke: Department of Veterans Affairs multisite clinical trial. J. Rehabil. Res. Dev. 48, 445–458 (2011).

34. 34.

Chackochan, V. T. Development of Collaborative Strategies in Joint Action. PhD thesis, University of Genoa, Italy (2018).

35. 35.

Braun, D. A., Ortega, P. A. & Wolpert, D. M. Nash equilibria in multi-agent motor interactions. PLoS Comput. Biol. 5, e1000468 (2009).

36. 36.

Hogan, N. et al. Motions or muscles? Some behavioral factors underlying robotic assistance of motor recovery. J. Rehabil. Res. Dev. 43, 605 (2006).

37. 37.

Kahn, L. E. et al. Robot-assisted movement training for the stroke-impaired arm: does it matter what the robot does? J. Rehabil. Res. Dev. 43, 619 (2006).

38. 38.

Spong, M. & Vidyasagar, M. Robot Dynamics and Control (Wiley, Hoboken, 1989).

39. 39.

Codourey, A. & Burdet, E. A body-oriented method for finding a linear form of the dynamic equation of fully parallel robots. Proc. IEEE Int. Conf. Robot. 2, 1612–1618 (1997).

40. 40.

Campolo, D. et al. H-Man: a planar, H-shape cabled differential robotic manipulandum for experiments on human motor control. J. Neurosci. Meth. 235, 285–297 (2014).

## Acknowledgements

We thank J. Eden, T. Mylvaganam, N. P. Perez, Q.-C. Pham and K. P. Tee for their careful reading and comments on the manuscript. This research was supported in part by the European Commission grant EU-H2020 COGIMON (644727), UK EPSRC MOTION grant EP/NO29003/1 and Singapore MOE Tier1 grant RG48/17.

## Author information

### Affiliations

1. #### Department of Bioengineering, Imperial College of Science, Technology and Medicine, London, UK

• Y. Li
• , G. Carboni
• , F. Gonzalez
•  & E. Burdet

• Y. Li
3. #### School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore, Singapore

• D. Campolo
•  & E. Burdet

### Contributions

Control concepts: Y.L. and E.B.; algorithm and simulation: Y.L.; set-up: F.G. and D.C.; experiments: G.C.; results analysis: Y.L., G.C. and E.B.; manuscript writing: Y.L. and E.B. All authors have read and edited the manuscript, and agree with its content.

### Competing interests

The authors declare no competing interests.

### Corresponding authors

Correspondence to Y. Li or E. Burdet.

## Supplementary information

### DOI

https://doi.org/10.1038/s42256-018-0010-3