Abstract
The last decades have seen a surge of robots working in contact with humans. However, until now these contact robots have made little use of the opportunities offered by physical interaction and lack a systematic methodology to produce versatile behaviours. Here, we develop an interactive robot controller able to understand the control strategy of the human user and react optimally to their movements. We demonstrate that combining an observer with a differential game theory controller can induce a stable interaction between the two partners, precisely identify each other’s control law, and allow them to successfully perform the task with minimum effort. Simulations and experiments with human subjects demonstrate these properties and illustrate how this controller can induce different representative interaction strategies.
Main
Traditional robotic manipulators, such as those used in the automotive industry, are potentially dangerous to human workers and thus are kept separated from them. However, robotic systems have been increasingly used in recent decades to work in physical contact with humans. While the potentialities of such contact robots are suggested by the benefits observed during physical tasks carried between humans^{1,2}, they have so far made little use of the opportunities of interactive control^{3}: either they are fully controlled by the operator (as in ‘master–slave’ behaviour mode^{4,5,6}), or, conversely, their control law does not adapt during movement to the interaction with the user, for example in typical rehabilitation robots^{7}. To efficiently assist a human user, a contact robot should arguably react and adapt to their specific motion behaviour according to a desired control strategy.
How should a contact robot be controlled to provide a stable and appropriate response to a user with unknown dynamics during various activities ranging from sport training, to physical rehabilitation and shared driving^{8,9,10}? Specific human–robot interactions have been studied^{11,12}, but a general framework for interactive control is still missing. It has been suggested that differential game theory (GT) can be used as a framework to describe various interactive behaviours between a robot and its human user^{13}. However, GT typically assumes knowledge of the partner’s dynamics and control strategies^{14,15}, while a contact robot cannot a priori know the sensorimotor control of the human user.
On the other hand, the study of physical tasks between humans has revealed how the capability to understand the partner’s sensorimotor control and to adapt one’s own control is key to interaction benefits^{16}. Some recent papers have studied how finite games can deal with limited knowledge of the partner^{17,18}, but it is unclear how the available techniques should be transferred to the continuous control of the interaction with a robot. A method to model the interaction force of a contact robot with its user has been introduced in ref. ^{19}, but in an approach requiring force sensing and restricted to the ‘collaboration’ between equals^{13}. However, different control strategies are required for various tasks, such as ‘lessthanneeded assistance’ to promote successful learning in physical rehabilitation^{20}, or a deceptive behaviour needed for ‘competition’ that can be used to challenge a user^{21}.
Therefore, our goal is to develop a versatile interactive motion behaviour for a robot in physical contact with a human user, with which these two agents can optimally react to each other by learning each other’s control. For this purpose, we model the robot’s and human’s task objectives through respective cost functions in a GT framework, which enables us to specify the desired control strategy^{13}. However, the mechanical connection between the partners means that their individual behaviours will interfere and may lead to instability. It is unclear how to ensure stability of the human–robot system and task completion while identifying the human user’s control strategy and adapting the robot’s control. To address these issues for simultaneous coadaptation and control, we propose using adaptive control^{22} to estimate the partner’s cost function and GT to compute a Nash equilibrium and resolve possibly conflicting situations where each partner optimizes their own behavior based on the available knowledge^{23}. This adaptive GT controller is validated on reaching arm movements. It does not require measurement of the interaction force, and is proved to yield a stable interaction in typical conditions. The partner’s controller model can be used to specify different types of behaviour, which we illustrate in experiments with human subjects exhibiting lessthanneeded assistance required for physical rehabilitation and a competitive behaviour.
A game theory controller that understands the partner
As human motion planning is carried out essentially in task space^{24}, we describe the control of a contact robot and its human user in the corresponding space {x} of the robot’s end effector, where \(x\in {{{\mathbb{R}}}}^{n}\). The control can then be transferred to the joint space as described in the Methods. The variables used throughout this Article are listed in Table 1. Let
describe how a contact robot (with command \(v\in {{{\mathbb{R}}}}^{n}\)) and a human user (applying a force \({u}_{{{\rm{h}}}}\in {{{\mathbb{R}}}}^{n}\)) move the robot’s dynamics \(\Psi (x,\dot{x},{\ddot{x}})\). Selecting the mechanical impedance process
to track a common and fixed target \({x}_{\rm{d}}\in {{{\mathbb{R}}}}^{n}\) (with \({\dot{x}}_{\rm{d}}=0\)), the control equation yields:
where \(u\in {{{\mathbb{R}}}}^{n}\) will be computed according to linear GT as described below, 0_{n} is the n × n zero matrix, 1_{n} the n × n identity matrix (with 1 as diagonal coefficients and 0 elsewhere), and I and D are the inertia and viscosity matrices specifying the desired interaction dynamics.
Using noncooperative differential GT^{15}, we can describe the robot’s interaction with the human user (equation (3)) as a game between two players who minimize their respective cost functional
where (·)^{T} represents the transpose operator and \(\hat{(\cdot )}\) the estimate of (·). These cost functionals mean that the robot knows its cost U and estimates the cost \({{\hat{{{U}}}}}_{{{\rm{h}}}}\) of the human user, where \({{\hat{{{u}}}}}_{{{\rm{h}}}}\) is the estimation of their motor command. Each of the constant weight matrices Q and \({\widehat{Q}}_{{{\rm{h}}}}\) can be positive semidefinite or negative semidefinite. In these equations, each player fulfils the reaching task by minimizing the error to the target while using minimal metabolic cost. Since the state ξ includes two parts, the target error x − x_{d} and the velocity \(\dot{x}\), Q and \({\widehat{Q}}_{{{\rm{h}}}}\) include two components corresponding to position regulation and viscosity, respectively. Based on differential GT for linear systems^{25}, the following control inputs of the robot and its modelled human user minimize the cost functionals (equation (4)) in the sense of the Nash equilibrium:
where L ≡ [L_{e}, L_{v}] (obtained using the solution of the Riccati equation (equations (5a,b))) is the feedback gain of the robot’s control and \({\widehat{L}}_{{{\rm{h}}}}\equiv \left[{L}_{{{\rm{h}}},{{\rm{e}}}},{L}_{{{\rm{h}}},{{\rm{v}}}}\right]\) (using equations (5c,d)) is the robot’s estimate of the human feedback gain. Note how the robot’s and modelled human’s controls depend on each other through A_{r} and A_{h}, characterizing the coupled optimization.
Importantly, the human user’s control gain L_{h} is not a priori known to the robot. However, it can be estimated from equation (3) without requiring force sensing (that is, u_{h}) for actions along nongravityaffected directions, while ensuring stability of the closedloop system. For this purpose, we first express how the estimation \({{\hat{{{u}}}}}_{{{\rm{h}}}}\) affects the state ξ to \(\widehat{\xi }\):
We use this equation as an observer for u_{h}, where Γ is a matrix to make Γ − A positive definite, \(\widehat{\xi }\) the estimate of ξ and \(\widetilde{\xi }\) the state estimation error. Subtracting equation (3) from equation (6) then yields
As the state estimation error \(\widetilde{\xi }\) is due to the error \({\widetilde{L}}_{{{\rm{h}}}}\), we develop an update law for \({\widehat{P}}_{{{\rm{h}}}}\) (thus for \({\widehat{L}}_{{{\rm{h}}}}\equiv {B}^{{{\rm{T}}}}{\widehat{P}}_{{{\rm{h}}}}\)) to minimize \(\widetilde{\xi }\). For this purpose, we consider the Lyapunov function candidate
where tr(M) is the trace of matrix M. Its time derivative yields
for some positive definite matrix Γ_{0} ≡ Γ − A, where we have used the system dynamics (equation (3)) and estimation error dynamics (equation (7)). Setting
the last two terms of equation (9) cancel, yielding a nonpositive \(\dot{V}\), while \({\widehat{Q}}_{{{\rm{h}}}}\) can be computed from \({\widehat{P}}_{{{\rm{h}}}}\) using equation (5d). Equation (10) is critical as it enables identifying the control of the partner in order to adapt one’s own control using equation (5). The above derivation leads to the following theorem (demonstrated in the Methods) that summarizes our results on coadaptation and simultaneous control:
Considering the system of equation (3), if the robot and human estimate the partner’s controller and develop their own control as in equations (5), (6) and (10), then:

the closedloop system is stable and u, u_{h}, ξ are bounded

the partners’ controllers and cost functions converge to the correct values if ξ is persistently exciting

the Nash equilibrium is achieved, that is, the cost functions in equation (4) will increase if either of the control inputs differ from those in equation (5)
Importantly, the interactive controller with coadaptation and partner’s identification described in equations (5), (6) and (10) can be used to implement representative interaction control behaviours such as those identified in ref. ^{13}. ‘Coactivity’, when two agents ignore each other’s dynamics, is implemented using for their control the respective part of equations (4) and (5) with \({\widehat{P}}_{{{\rm{h}}}}\equiv 0\) and \(\widehat{P}\equiv 0\), where \(\widehat{P}\) is the human’s estimate of P, yielding two independent linear quadratic regulators (LQR). Collaboration between two partners without hierarchy arises when they contribute to the dynamics (equation (3)) using each controller of equation (5) with positive definite {Q, Q_{h}}. Competition can be implemented with a negative definite gain matrix Q for the robot in order to challenge the human user achieving their goal by pushing them away from it, which will promote active learning^{20}. ‘Cooperation’ arises when the two agents take complementary roles, which can be defined through the sharing rule
enabling the GT controller to continuously modify the contributions between the partners^{26}. Different from collaboration, cooperation fixes the task performance through the total weight C, and uses equation (11) to share the effort. A special case of cooperation, ‘assistance’, arises when C is set to let the robot fulfil the task alone without interaction with a human user. As humans tend to relax during motor actions^{27,28}, the robot will gradually take over the whole task load while the human will become passive. Another special case of cooperation particularly important in physical training is assistlessthanneeded^{7,20}, designed to keep a human trainee engaged during sport practice or physical rehabilitation. This interaction control mode is implemented by setting C to make the robot short of reaching the target alone, which will induce the human to increase their effort in order to reach the target.
A pseudocode summarizing the steps of the proposed algorithm is
Input: Current state ξ, target x_{d}.
Output: Robot’s control input u, estimated human’s cost function weight \(\hat Q_h\) in equation (5d).
begin
Define the target position x_{d}, initialize Q, \(\hat Q_h\), u, \(\hat u_h\), \(\hat \xi \), \(\hat P_h\), set the parameters \(\Gamma \) in equation (6), α in equation (10), and the terminal time t_{f} of one trial. In the case of cooperation set also C.
while t < t_{f} do

Measure the position x and velocity \(\dot x\), and form the state ξ.

Calculate the estimation error \(\tilde \xi \) and the estimated state \(\hat \xi \) using equation (6): \(\tilde \xi \equiv \hat \xi  \xi ,\,\dot{\hat{\xi}} = A\hat \xi + B(u + \hat u_h)  \Gamma \tilde \xi \).

Update the matrix \(\hat P_h\) using equation (10): \(\dot{\hat P}_h \equiv \alpha (\tilde \xi  \xi )\xi ^T\). Then compute the estimated human’s control gain \(\hat L_h \equiv B^T\hat P_h\) and motor command \(\hat u_h \equiv  \hat L_h\xi \).

Solve the Riccati equation in equation (5b) to yield P, and compute the robot’s control input \(u =  B^TP\xi \).

Calculate the estimated human’s cost function weight \(\hat Q_h\) according to the Riccati equation in equation (5d).
In the case of cooperation, use equation (11) to adapt the corresponding cost function weight \(Q = C  \hat Q_h\).
Simulations and experiments
To test the interactive control and coadaptation of the two agents, we simulated a neurorehabilitation scenario of arm reaching movements back and forth between −10 cm and +10 cm (Fig. 1). The robot dynamics were simulated using equations (5), (6) and (10) to generate u, the simulated human dynamics used a similar set of equations to generate u_{h}, and both of these inputs were used to drive the task dynamics according to equation (3). Both agents were assumed to have no initial knowledge of the partner’s control (thus, initially \({\widehat{P}}_{{{\rm{h}}}}\equiv 0\) and \(\widehat{P}\equiv 0\)) while equations (6) and (10) were used to estimate it. Figure 1a shows that the collaborative GT controller yields a stable behaviour fulfilling the reaching task. With this controller the human control gain was identified with an error smaller than 5% of the real value in 10 s.
Could the reaching task also be fulfilled using a more simple control law neglecting the partner’s control? We addressed this question by setting \({\widehat{P}}_{{{\rm{h}}}}=0\) and \(\widehat{P}=0\) in equation (5) for the simulated robot and human agents, respectively. This leads to the simplified solution of two agents with independent LQR ignoring the interaction with the partner. Simulation of the reaching task, displayed in the dashed blue lines of Fig. 1a, suggests that while the task is generally fulfilled with the independent LQR, this would, however, require larger control gains L_{h}, thus a larger effort compared with the GT controller. Both the LQR gains and the GT gains depend on the predefined cost functionals. To investigate the effect of the costs systematically, we examined the gains in LQR and GT when Q_{h} varied from Q/10 to 10Q. We see in Fig. 1b that the LQR gains are always larger than the GT gains, with the difference becoming smaller when Q_{h} increases, as the robot’s relative influence decreases. The GT controller considers the interaction with the simulated human (through equation (5b)) and computes the minimal force for the robot to fulfil the task with them. In turn, the simulated human can also minimize their effort.
Is the partner’s control identification really necessary, or could any approximate partner’s model be used instead? To test this question we carried out reaching simulations with a biased identification. Figure 1c,d show that a bias in the partner’s velocity estimation can bring instability to the system and prevent it from reaching the target. The theorem’s demonstration in the Methods shows that both the bias level leading to instability and the relative effect of position and velocity gains on stability depend on the system matrix A and on the cost functionals. Using these cost functionals with the values set in the Methods, both position and velocity gains are positive regardless of the estimation bias. However, the system matrix A is unstable due to damping. Therefore, the closedloop system will become unstable if the velocity gain cannot compensate for the damping due to estimation of velocity gain with a bias larger than 50 N m^{–1}. Figure 1c,d illustrate the system’s behaviour in the stable case of a 30 N m^{–1} bias and in the unstable case of a 50 N m^{–1} bias, respectively. These results illustrate the importance of accurately identifying the partner’s control based on a sound analysis of the dynamics involved, as was developed in equations (6)–(10).
To test the ability of the controller to adapt to any partner’s motor condition, we simulated the scenario of an individual with motor weakness who uses the robot to help recover their motor function (Fig. 2a–c). We assumed that the simulated human improves their arm control by increasing the weight Q_{h} trial after trial as described in the Methods. Robot interaction was adapted after each trial based on the estimated weight of the simulated human’s cost function. To promote active learning while still assisting a weak user, the simulated robot used an assistlessthanneeded cooperation strategy, with equation (11) and C making the robot short of reaching the target alone. We see that while the reaching task is completed with similar performance in all trials (Fig. 2c), this is achieved with different contributions of the simulated human and robot (Fig. 2a,b). Specifically, Fig. 2a,b shows how the robot updates its estimate of the simulated human motor control weight Q_{h} and reduces its own weight Q correspondingly.
Interestingly, it appears in Fig. 2b that the robot’s weight in the fifth iteration decreases to a negative value, that is, the robot provides resistance (instead of assistance) to the simulated human. This form of competition is desirable as it will let the user try to improve performance and be engaged in the physical training. However, the robot will not force the trainee beyond their motor capabilities, as it will keep compensating for the missing dynamics through equation (11). In turn, this illustrates the capability of the adaptive GT controller to induce various types of interactive behaviour, namely here, lessthanneeded assistance and competition. The simulation results also illustrate that the two components in the weight matrices for reaching and viscosity (that may correspond to spasticity in a neurologically impaired user^{29}) can be modulated separately, which makes it possible to implement requirements in various conditions.
Considering the scenario of an impaired user such as a stroke survivor using the robot to train their motor control, their behaviour may be erratic with irregular progresses and setbacks over consecutive training sessions. To test the capability of our interactive controller to deal with such erratic behaviours, we simulated a human user with performance varying trial after trial, which is implemented by random changes in the corresponding cost function. Figure 2d shows that even in this situation the robot can adapt to the changing performance by updating its own cost function weight in a suitable way trial after trial.
How will our designed interaction behaviour and adaptation work with real human users? We tested this by implementing lessthanneeded assistance control during arm reaching (on the robotic interface shown in Fig. 3a) as illustrated in Fig. 3b. The red traces in Fig. 3c illustrate the robot’s reaching behaviour without interaction with a user. We can observe a nearly constant motion in consecutive trials with estimated weight of the human user at ≈0, as is expected. The movement is short of 3 cm from the target due to the low C parameter set in the controller via equation (11). The pink traces exhibit the adaptation taking place in the healthy subject scenario where the robot assists a representative user instructed to complete the reaching task as best as possible. We observe that the reaching task is effectively carried out, while the robot successfully identifies the increasing contribution of the human user and lowers its assistance correspondingly. Interestingly, the jumps in the weights at the intervals when the movement direction changes stimulate the convergence of the estimated values to the real values. When the subject is instructed to keep the arm passive, in a weak arm scenario, the robot observes this behaviour with a low estimated weight Q_{h} and increases its motor weight Q correspondingly.
To test whether the behaviour with our GT controller depends on subjectspecific parameters, we repeated the same experiment with ten healthy subjects, who carried out five trials with three backandforth reaching movements, in each of the healthy and weak arm conditions. Figure 3d shows that the controller does not prevent natural motion variability in consecutive trials, which may facilitate learning^{20,24}. For each subject, the Q_{h,e} value converged to reliable values, yielding values with healthy > no interaction (ttest, max{p_{i}} < 0.009, i = 1…10) and with weak arm < no interaction (ttest, max{p_{i}} < 0.012, i = 1…10). This demonstrates that our algorithm is able to clearly identify the different behaviours of each subject and adapt the GT control correspondingly. This is further illustrated in Fig. 3e, where we asked a representative subject to frequently vary his behaviour, as can be expected in neurologically impaired individuals. The result shows that the controller is able to catch these behavioural changes in a suitable way. Altogether, these results demonstrate the efficiency of the simple regulation of assistance during cooperation of the human and robot through the sharing rule (equation (11)) with the GT controller (of equations (5), (6) and (10)).
Discussion
Contact robots that physically interact with humans are being increasingly used in industry and for physical training, but they lack a systematic methodology to produce versatile interactive behaviours. Typical rehabilitation robots to help limb movement training^{7}, or intelligent industrial systems to support heavy objects against gravity and facilitate their manipulation^{30}, use a controller independent of the human user. This coactivity type of interaction strategy, which does not require the robot to observe the human user behaviour, works only as long as the robot’s task corresponds to the user’s task. Conversely, in master–slave control the robot fully follows the user’s movement. Master–slave control has been used for teleoperated robots, in force extender exoskeletons^{31}, for redirecting movements to admissible areas in Northwestern University’s cobots^{32}, or for robotassisted mirror physical therapy where the unaffected arm of an hemiplegic patient drives the impaired arm^{4,33}. These two simple interaction behaviours, where the robot either ignores the human operator control (in coactivity) or blindly follows it (in master–slave control) have led to successful systems for specific tasks. However, the example of sophisticated interaction strategies between humans promise a more versatile and flexible interaction optimally combining the two partners’ capabilities^{1,2,16}. Such strategies require an efficient process to understand the partner’s control, which has not been developed in previous interactive controllers.
To design such versatile interactive control, this work thus addressed the basic human–robot interaction control issue, where a pair of physically connected agents have to understand each other’s control and update their own control in order to successfully complete a reaching task together. This simultaneous partner’s control identification and coadaptation has not been addressed in previous works^{19,34}, and it was unclear whether this could lead to an equilibrium or would result in instability. Integrating differential GT and an observer, the adaptive controller presented in this Article provides a stable solution to this problem, with bounded control signals, identification of the partner’s control law and minimization of the individual cost in both agents in the sense of the Nash equilibrium. The simulation results showed that the proposed method enables the robot and human to estimate each other’s cost function during interaction accurately and without force sensing, as well as to adapt their control correspondingly to fulfil the common task while guaranteeing a specific interactive behaviour. The adaptive properties of the control and specific behaviours were further shown in a robotic implementation with human users mimicking physical training for motor recovery.
Does the success of this implementation require that the humans’ motor control corresponds to GT? It is still unknown whether the human central nervous system behaves as predicted by GT to physically interact with other humans, although there is evidence that it considers the partner’s sensorimotor control during common task performance^{16} and may carry out behavioural decisions that conform to GT^{35}. However, irrespective of whether humans’ sensorimotor control corresponds to GT, the controller of equations (5), (6) and (10) will induce both a stable interaction with the human user (as it adapts its control gain to compensate for the human bounded but possibly unstable control gain), and an interactive behaviour corresponding to a desired control strategy. In turn, the algorithm introduced here may be used to infer how human motor control corresponds to GT, and to determine which characteristic behaviours are adopted during various tasks.
The simulation and implementation results demonstrate that the new adaptive GT controller can flexibly implement different representative behaviours of physical interaction between two agents as proposed in refs. ^{1,13}. These include the collaboration or competition between partners without a hierarchy, the cooperation of complementary partners, as well as assistlessthanneeded assistance promoting engagement and active training. Therefore, this work achieved the programme of ref. ^{13}, by explicating the critical partner’s ‘action understanding’ necessary for an efficient interactive control and by developing the versatile differential GT control algorithm to implement these behaviours. In contrast, the GT controller presented in ref. ^{19} could only implement a collaboration and required force sensing.
The algorithm for the simultaneous partner’s control identification and coadaptation, which was developed for a reaching movement here, may be generalized to more complex movements such as tracking^{16}. It can be used to regulate the interaction between any two autonomous agents, such as two robots, or to design human–robot interaction as was demonstrated in this Article. It may further serve as a tool for modelling bimanual control with brain interhemispherical communication in both healthy individuals and neurologically impaired (for example, cerebral palsy) subjects, as well as for modelling the control of physical interaction in a human pair.
The experiments reported here illustrate that the presented algorithm can also be directly used as a dynamic environment for training impaired reaching behaviour, for example after a stroke. Interesting properties of the adaptive controller for training arm movements include the dynamic interaction not preventing natural motion variability in consecutive trials, as well as the stable and lessthanneeded motion assistance, from nearly complete motion guidance to competition; both of these factors are critical for successful neurorehabilitation^{36,37}. The experiment demonstrated the adaptability of the robot control, yielding a challenging but supportive training environment and inducing active inconspicuous adaptation.
Methods
Joint space control
The rigid body dynamics of many serial mechanisms can be represented through an equation of the form^{38}
where q is the joint space vector, H the mass matrix, \(N\dot{q}\) the Coriolis and centrifugal forces (which can be completed with other velocitydependent forces such as damping), and G the gravity term. This equation can be transformed to a positiondependent force \({\mathit{\Psi}} (x,\dot{x},{\ddot{x}})\) using kinematic transformations
where x is the robot end effector pose and F the endpoint wrench. For a redundant mechanism, the pseudoinverse \({J}^{\dagger }={J}^{{{\rm{T}}}}{\left(J{J}^{{{\rm{T}}}}\right)}^{1}\) of the Jacobian J can replace J^{−1} to compute the robot endpoint force, as well as its joints angle, angular velocity and acceleration. In the case of a parallel mechanism, the dynamics will generally already be in a form \({\mathit{\Psi}} (x,\dot{x},{\ddot{x}})\)^{39}.
Theorem demonstration
Substituting update law equation (10) into equation (9) yields
\(BL+B{\widehat{L}}_{{{\rm{h}}}}A\) is positive definite if Q is positive definite, according to equation (5b). As Γ_{0} is also positive definite, it follows \({{{\rm{lim}}}}_{t\to \infty }\left\Vert \xi (t)\right\Vert =0\) and \({{{\rm{lim}}}}_{t\to \infty }\left\Vert \widetilde{\xi }(t)\right\Vert =0\). Therefore, ξ is bounded and \({{{\rm{lim}}}}_{t\to \infty }\left\Vert \dot{\widetilde{\xi }}(t)\right\Vert =0\) if \({{{\rm{lim}}}}_{t\to \infty }\left\Vert \dot{\widetilde{\xi }}(t)\right\Vert\) exists. According to the estimation error dynamics (equation (7)) we thus have \({{{\rm{lim}}}}_{t\to \infty }{\widetilde{L}}_{{{\rm{h}}}}(t)\xi (t)={0}_{n\times 1}\). If ξ is persistently exciting, that is, there exist positive constants t_{0}, T_{0} and β with \({\int }_{t}^{t+{T}_{0}}\xi (s){\xi }^{{{\rm{T}}}}(s){{\rm{d}}}s\ge \beta {1}_{2n}\forall t\ge {t}_{0}\), it follows \({{{\rm{lim}}}}_{t\to \infty }\left\Vert {\widetilde{L}}_{h}(t)\right\Vert =0\), thus \({\widetilde{L}}_{{{\rm{h}}}}(t)\) is bounded. L_{h} is assumed to be bounded since it is the human’s control gain. Therefore, \({\widehat{L}}_{{{\rm{h}}}}\) is also bounded. According to equation (5b), A_{r} is bounded, thus so are P, L and u. The boundedness of u_{h} follows from the boundedness of L_{h} and ξ.
Using equation (5d), we can compute the estimation error \({\widetilde{Q}}_{{{\rm{h}}}}={\widehat{Q}}_{{{\rm{h}}}}{Q}_{{{\rm{h}}}}\) that is due to the errors \({\widetilde{P}}_{{{\rm{h}}}}\) and \(\widetilde{P}\). As both of these errors converge to zero with the human and robot estimations, we have \({\widetilde{Q}}_{{{\rm{h}}}}\to 0\), that is, the human’s unknown cost function weight Q_{h} is identified by the robot. It can be shown similarly that the robot’s cost function weight Q is identified by the human.
Multiplying the robot equation in equation (5b) by \({\widehat{\xi }}^{{{\rm{T}}}}\) on the left side and by \(\widehat{\xi }\) on the right side, and considering the estimation error dynamics equation (7) yields
Taking the limits \(\widetilde{\xi }\to {0}_{2n\times 1}\) and \(\dot{\widetilde{\xi }}\to {0}_{2n\times 1}\) yields
It can be shown similarly that
as \({\widetilde{\xi }}_{{{\rm{h}}}}\to {0}_{2n\times 1}\) and \({\dot{\widetilde{\xi }}}_{{{\rm{h}}}}\to {0}_{2n\times 1}\), where \({\widetilde{\xi }}_{{{\rm{h}}}}={\widehat{\xi }}_{{{\rm{h}}}}\xi\) is the state estimation error of the human. ω_{h} → 0 and ω → 0 indicate that the Nash equilibrium is achieved for the human–robot interaction system^{25}.
Simulations
Ten backandforth reaching movements were simulated between −10 cm and +10 cm with a total time of 40 s. The simulation parameters were set as: inertia I = 6 kg, damping D = −0.2 N m^{–1}, observer gain Γ ≡ diag(10, 1), learning rate α ≡ 10^{4}. The collaborative interaction of Fig. 1 was controlled using Q = Q_{h} ≡ diag(100, 0). Biased identification was simulated by adding [0, 30] Ns m^{–1} and [0, 50] Ns m^{–1} offsets to the estimated robot and human velocity gains, respectively.
Motor recovery corresponding to the simulation shown in Fig. 2a–c was implemented as an iterative increase of the weight in the human’s cost function with Q_{h} ≡ diag(10, 0) + i diag(60, 0), where i is the trial number, while C ≡ diag(200, 0) was set for the robot to fulfil the reaching task alone. The human’s erratic behaviour shown in Fig. 2d was simulated by using a uniformly distributed random number ρ in the weight of the human’s cost function with Q_{h} ≡ diag(200, 0) + ρ diag(−100, 0).
Experiment
Setup
The validation with (healthy) human subjects was carried out using the HMan, a cableactuated 2DOF manipulandum^{40}. The version fabricated at Imperial College has a rectangular 24 × 28 cm^{2} workspace and visual feedback colocated with planar arm movements provided on the screen (Fig. 3a,b). The control of the two Maxon motors was carried out at 1,000 Hz using realtime LabVIEW while the (x, y) position was computed from the motors’ optical encoders.
Task
The reaching task included three backandforth reaching movements between −10 cm and +10 cm with a total time of 12 s, which was long enough for the robot to identify the human user’s control. This experiment was approved by the ethics committee at Imperial College (no. 16IC3580). The actual HMan robot dynamics \({{\mathit{\Psi }}}\equiv I{\ddot{x}}+D\dot{x}\) were used as the desired dynamics. The following control parameters were used throughout the experiment: desired task performance weight C ≡ 4,000, observer gain Γ ≡ 30, learning rate α ≡ 2 × 10^{4}, and initial parameter \({\widehat{P}}_{{{\rm{h}}}}\equiv 0\). C was tuned so that the robot without interaction comes within 3 cm from the target.
Protocol
Ten naive subjects without known sensorimotor impairment participated in the experiment, after having given their informed consent. Each subject was instructed to sit in front of the robotic interface facing the monitor and move the handle by following ‘move forward’ and ‘backwards’ instructions displayed on the monitor. Subjects were allowed to practice until they became familiar with the reaching task and the robotic interface. Subsequently, each subject performed five trials (with one trial consisting of three backandforth reaching movements) in each of the two experimental conditions ‘healthy’ and ‘weak arm’, in which they were instructed to reach the target and to completely relax the arm, respectively. The control condition ‘no interaction’ was run by the robot alone for 50 times.
Data analysis
For each subject and each condition of healthy or weak arm, a ttest was applied on the last value of Q_{h,e} of the five trials to determine the difference from the no interaction condition. The fourth trial of subject 1 had to be discarded due to a recording error as the subject started that trial too early, so there were four values in this case. For each subject, the difference was significant with p < 1.2% and we report the maximum p value over the subjects.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Code availability
The code that supports the findings of this study is available from the corresponding authors upon reasonable request.
Data availability
The data that support the findings of this study are available from the corresponding authors upon reasonable request.
Additional information
Identifiable images
Consent to publish identifiable images of research participants was obtained.
Ethical Compliance
We have complied with all relevant ethical regulations.
Ethics Committee
Guidelines for study procedures were provided by Imperial College London.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.
Sawers, A. & Ting, L. H. Perspectives on humanhuman sensorimotor interactions for the design of rehabilitation robots. J. Neuroeng. Rehabil. 11, 142 (2014).
 2.
Ganesh, G. et al. Two is better than one: physical interactions improve motor performance in humans. Sci. Rep. 4, 3824 (2014).
 3.
Jarrassé, N., Sanguineti, V. & Burdet, E. Slaves no longer: review on role assignment for humanrobot joint motor action. Adapt. Behav. 22, 70–82 (2014).
 4.
Hesse, S. et al. Computerized arm training improves the motor control of the severely affected arm after stroke: a singleblinded randomized trial in two centers. Stroke 36, 1960–1966 (2005).
 5.
Hokayem, P. F. & Spong, M. W. Bilateral teleoperation: an historical survey. Automatica 42, 2035–2057 (2006).
 6.
Passenberg, C., Peer, A. & Buss, M. A survey of environment, operator, and taskadapted controllers for teleoperation systems. Mechatronics 20, 787–801 (2010).
 7.
Colombo, R. & Sanguineti, V. in Rehabilitation Robotics (eds Colombo, R. & Sanguineti, V.) 63–74 (Elsevier, 2018).
 8.
MarchalCrespo, L. et al. The effect of haptic guidance and visual feedback on learning a complex tennis task. Exp. Brain Res. 231, 277–291 (2013).
 9.
Díaz, I., Gil, J. J. & Sánchez, E. Lowerlimb robotic rehabilitation: literature review and challenges. J. Robot. 2011, 1–11 (2011).
 10.
Na, X. & Cole, D. J. Linear quadratic game and noncooperative predictive methods for potential application to modelling driverAFS interactive steering control. Veh. Sys. Dyn. 51, 165–198 (2013).
 11.
Music, S. & Hirche, S. Control sharing in humanrobot team interaction. Annu. Rev. Control 44, 342–354 (2017).
 12.
Khoramshahi, M. & Billard, A. A dynamical system approach to taskadaptation in physical human–robot interaction. Auton. Robot. https://doi.org/10.1007/s105140189764z (2018).
 13.
Jarrassé, N., Charalambous, T. & Burdet, E. A framework to describe, analyze and generate interactive motor behaviors. PLoS ONE 7, e49945 (2012).
 14.
Starr, A. W. & Ho, Y.C. Nonzerosum differential games. J. Optim. Theory Appl. 3, 184–206 (1969).
 15.
Basar, T. & Olsder, G. J. Dynamic Noncooperative Game Theory 2nd edn (Society for Industrial and Applied Mathematics, Philadelphia, 1999).
 16.
Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M. & Burdet, E. Physically interacting individuals estimate their partner’s movement goal to enhance motor abilities. Nat. Hum. Behav. 1, 0054 (2017).
 17.
Kiumarsi, B. et al. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neur. Netw. Learn. Syst. 29, 2042–2062 (2018).
 18.
Marden, J. R., Arslan, G. & Shamma, J. S. Joint strategy fictitious play with inertia for potential games. IEEE Trans. Autom. Contr. 54, 208–220 (2009).
 19.
Li, Y., Tee, K. P., Yan, R., Chan, W. L. & Wu, Y. A framework of humanrobot coordination based on game theory and policy iteration. IEEE Trans. Robot. 32, 1408–1418 (2016).
 20.
Reinkensmeyer, D. J. et al. Computational neurorehabilitation: modeling plasticity and learning to predict recoverys. J. Neuroeng. Rehabil. 13, 1–25 (2016).
 21.
Nierhoff, T., Leibrandt, K., Lorenz, T. & Hirche, S. Robotic billiards: understanding humans in order to counter them. IEEE Trans. Cybern. 46, 1889–1899 (2016).
 22.
Slotine, J.J. E. & Li, W. Applied Nonlinear Control (PrenticeHall, Upper Saddle River, 1991).
 23.
Gajic, Z. & Qureshi, M. T. J. Lyapunov Matrix Equation in System Stability and Control (Elsevier, Amsterdam, 1995).
 24.
Burdet, E., Franklin, D. W. & Milner, T. E. Human Robotics: Neuromechanics and Motor Control (MIT Press, Cambridge, MA, 2013).
 25.
Engwerda, J. Algorithms for computing Nash equilibria in deterministic LQ games. Comput. Manag. Sci. 4, 113–140 (2007).
 26.
Evrard, P. & Kheddar, A. Homotopy switching model for dyad haptic interaction in physical collaborative tasks. In Proc. IEEE Worldhaptics 45–50 (2009).
 27.
Emken, J. L., Benitez, R., Sideris, A., Bobrow, J. E. & Reinkensmeyer, D. J. Motor adaptation as a greedy optimization of error and effort. J. Neurophysiol. 97, 3997–4006 (2007).
 28.
Franklin, D. W. et al. CNS learns stable, accurate, and efficient movements using a simple algorithm. J. Neurosci. 28, 11165–11173 (2008).
 29.
Levin, M. F. et al. Deficits in the coordination of agonist and antagonist muscles in stroke patients: implications for normal motor control. Brain Res. 853, 352–369 (2000).
 30.
Colgate, J. E. et al. Methods and apparatus for manipulation of heavy payloads with intelligent assist devices. US patent 7185774 (2007).
 31.
Zoss, A. B., Kazerooni, H. & Chu, A. Biomechanical design of the Berkeley lower extremity exoskeleton (BLEEX). IEEEASME Trans. Mech. 11, 128–138 (2006).
 32.
Peshkin, M. A. et al. Cobot architecture. IEEE Trans. Robot. Autom. 17, 377–390 (2001).
 33.
Burgar, C. G. et al. Robotassisted upperlimb therapy in acute rehabilitation setting following stroke: Department of Veterans Affairs multisite clinical trial. J. Rehabil. Res. Dev. 48, 445–458 (2011).
 34.
Chackochan, V. T. Development of Collaborative Strategies in Joint Action. PhD thesis, University of Genoa, Italy (2018).
 35.
Braun, D. A., Ortega, P. A. & Wolpert, D. M. Nash equilibria in multiagent motor interactions. PLoS Comput. Biol. 5, e1000468 (2009).
 36.
Hogan, N. et al. Motions or muscles? Some behavioral factors underlying robotic assistance of motor recovery. J. Rehabil. Res. Dev. 43, 605 (2006).
 37.
Kahn, L. E. et al. Robotassisted movement training for the strokeimpaired arm: does it matter what the robot does? J. Rehabil. Res. Dev. 43, 619 (2006).
 38.
Spong, M. & Vidyasagar, M. Robot Dynamics and Control (Wiley, Hoboken, 1989).
 39.
Codourey, A. & Burdet, E. A bodyoriented method for finding a linear form of the dynamic equation of fully parallel robots. Proc. IEEE Int. Conf. Robot. 2, 1612–1618 (1997).
 40.
Campolo, D. et al. HMan: a planar, Hshape cabled differential robotic manipulandum for experiments on human motor control. J. Neurosci. Meth. 235, 285–297 (2014).
Acknowledgements
We thank J. Eden, T. Mylvaganam, N. P. Perez, Q.C. Pham and K. P. Tee for their careful reading and comments on the manuscript. This research was supported in part by the European Commission grant EUH2020 COGIMON (644727), UK EPSRC MOTION grant EP/NO29003/1 and Singapore MOE Tier1 grant RG48/17.
Author information
Affiliations
Contributions
Control concepts: Y.L. and E.B.; algorithm and simulation: Y.L.; setup: F.G. and D.C.; experiments: G.C.; results analysis: Y.L., G.C. and E.B.; manuscript writing: Y.L. and E.B. All authors have read and edited the manuscript, and agree with its content.
Competing interests
The authors declare no competing interests.
Corresponding authors
Supplementary information
Rights and permissions
About this article
Received
Accepted
Published
Issue Date
DOI
Further reading

Ask this robot for a helping hand
Nature Machine Intelligence (2019)