Residual dynamics learning for trajectory tracking for multi-rotor aerial vehicles

This paper presents a technique to model the residual dynamics between a high-level planner and a low-level controller by considering reference trajectory tracking in a cluttered environment as an example scenario. We focus on minimising residual dynamics that arise due to only the kinematical modelling of high-level planning. The kinematical modelling is sufficient for such scenarios due to safety constraints and aggressive manoeuvres that are difficult to perform when the environment is cluttered and dynamic. We used a simplified motion model to represent the motion of the quadrotor when formulating the high-level planner. The Sparse Gaussian Process Regression-based technique is proposed to model the residual dynamics. Recently proposed Data-Driven MPC is targeting aggressive manoeuvres without considering obstacle constraints. The proposed technique is compared with Data-Driven MPC to estimate the residual dynamics error without considering obstacle constraints. The comparison results yield that the proposed technique helps to reduce the nominal model error by a factor of 2 on average. Further, the proposed complete framework was compared with four other trajectory-tracking approaches in terms of tracking the reference trajectory without colliding with obstacles. The proposed approach outperformed the others with less flight time without losing computational efficiency.


B z
a mapping matrix that projects the full state vector z onto the subspace of states relevant to the residual dynamics f norm nominal system dynamics, it can be either kinematic model or dynamic model f est augmented quadrotor motion model that comprises nominal dynamics and estimated residual dynamics by Sparse Gaussian Process (SGP) f d discrete augmented system dynamics g 1 (w), g 2 (w) the constraints enforced by system dynamics and obstacle for Nonlinear Model Predictive Control (NMPC) g m residual dynamics model that is learnt using Sparse Gaussian Process residual dynamics model that is learnt using Gaussian Process x k the robot's current state x k ∈ R n x , given by its position p k ∈ R 3 in meters and orientation α in radians at time step k xk the estimated robot state vector at time step k by using NMPC xk actual robot state vector at time step k after applying the current control ( ūk ) to the robot x m the optimal inducing points employed to construct an approximate multivariate Gaussian distribution by using SGP x re f desired reference robot state vector at time step k (x * ) i a testing data input point that comprises (x i , ūi , xi , xi ) y i a testing data expected output point y * the estimated residual dynamics for given x * z = [x; u] the state vector for the Gaussian Process δt the integration step for Runge-Kutta 4th order method

Introduction
Accurate reference trajectory tracking in cluttered environments 1 is still a challenging research problem.Sensing capabilities, e.g., how far the sensor can observe the environment, and computational and power capabilities are the main constraints that face when solving such research problems.Therefore, the quadrotor's full maneuverability can not be exploited due to those constraints.On the contrary, when the problem formulation uses a simplified motion model instead of a complex dynamical model, the residual dynamical error arises between the actual controller and desired control commands.
Figure 1.The high-level control command generation is based on Model Predictive Control (MPC).Residual dynamics (y) that arise between high-level control generated by MPC and low-level control generated by flight controller can not be estimated analytically.Hence, the Sparse Gaussian Process(SGP)-based learning technique is proposed to estimate y, where x k , x, and x are denoted current state, next desired state, and actual state after applying the current control to the system, respectively.
For controlling a quadrotor through high-speed agile maneuvers, it is necessary to consider aerodynamic drag effects, which act as the main component of residual dynamics and external aerodynamic effects, e.g., wind, that is applied on the quadrotor in addition to other constraints: dynamics and obstacle constraints.However, several studies related to agile maneuvers 2 , 3 , 4 , 5 , 6 do not consider such effects, which are very difficult to incorporate when modelling system dynamics, except approximating the quadrotor dynamics with simplified motion model 7 .Even if those effects are incorporated, the necessary external aerodynamic effects are difficult to obtain due to high-computational demands that leverage real-time performance.In other words, model complexity is constrained by the computational capabilities of the onboard controller.Nonetheless, such aerodynamic effects produce negligible impact for the low-speed maneuvers since dynamic effects can be neglected, considering only the kinematic modelling, especially in the plan-based control paradigm.Such a paradigm consists of two stages: planning and controlling.In the planning stage, a simplified motion model is utilized for generating near-optimal control policy, whilst in the controlling stage, the controller depends on the input from the planner and generates sufficient control commands based on the simplified motion model.However, utilizing an approximated motion model in the planning stage produces a dynamical error (residual dynamics) between the planner and the low-level controller that is hard to estimate analytically (Fig. 1).
In this research, we used a planner that consists of two sub-planners: a local planner that consists of a 4 degree of freedom model (simplified motion model) 8 and a replanner 9 that can push the reference trajectory out of obstacle zones.The local planner is formulated as a hard constraint optimization problem, whereas the replanner is formulated as a box-constrained function minimization problem, which iteratively refines the reference trajectory by pushing away from the obstacle zones.To compensate for the limitations of the simplified motion model, a learning mechanism based on the Sparse Gaussian Process (SGP) was introduced.This learning mechanism modeled the residual dynamics (y) that arose from the discrepancy between the simplified motion model and the actual system dynamics.As depicted in Fig. 1, the residual dynamics is determined by where g is the model that estimates residual dynamics, given current state and current control (x, u).The proposed approach employs velocity residual as the residual dynamics to integrate with the local planner's nominal dynamics, effectively reducing the discrepancy between the estimated control output from the local planner and the desired control that the simplified motion model is incapable of estimating.

Related works
Most recent trajectory generation and tracking methods are proposed either without considering obstacles or considering partially known or simplified obstacles, e.g., gates.For example, in 10 they have achieved 60km/h on a defined racing track.This approach uses Model Predictive Contouring Control (MPCC) 11 for local replanning at such a speed.However, such methods are not capable of flying through cluttered environments due to generating a time-optimal trajectory, ensuring safety and feasibility that are limited by onboard computational capacity.Graph-based planning techniques often rely on geometric methods like Jump Point Search (JPS) 12 , RRT 13 , and A 14 to generate an initial trajectory by identifying a path or a set of waypoints for a limited horizon.This initial trajectory serves as a guide for subsequent path-planning stages.This trajectory generation can be achieved by different methods, including minimum snap 15 , B-spline 9 .These generated trajectories, in general, do not guarantee dynamic feasibility.Kinodynamic-based trajectory planning 16 , 17 addressed these issues through kinodynamic search, whilst ensuing dynamic feasibility.Despite addressing such issues, kinodynamic-based trajectory planning remains a highly computational footprint.In 18 , the authors developed a kinodynamic-based framework, yet the framework failed to guarantee consistent smoothness.
Hierarchical trajectory planning 9 , e.g., global and local planning, is one of the ways that can address all the aforementioned issues when planning in a cluttered environment.In such planning, a global planner 19 tries to refine the desired trajectory considering safety constraints.Along with that, a local planner 8 ensures the dynamic feasibility and imposes safety constraints.In the DARPA subterranean challenge 20 , Tranzatto, M. et al. 21and De Petris, P. et al. 22 used a modified version of DJI M100 with position level control 23 relying on a highly accurate localisation module.Chung, T. H. et al. 24 , and Arm, P. et al. 25 used multiple depth sensors to build highly accurate elevation maps and groundbreaking local motion planning techniques, enabling robots to navigate effectively in complex and cluttered underground environments ensuring planning outputs are feasible for low-level controllers targeting exploration tasks.On the other hand, with fewer sensing capabilities, incorporating fully-fledged system dynamics makes it harder to solve trajectory planning whilst avoiding obstacles in real-time.Hence, the approaches that employ approximated system dynamics use external and internal disturbances alongside approximated dynamics 19 , 26 , 27 , 28 .Also, recent advancements in adaptive robust nonlinear control-based techniques 29,30 have demonstrated promising results for controlling aerial vehicles.Most of these techniques can be categorized under different variations of receding horizon-based control.A special variant of such control, GP-MPC (Gaussian Process MPC) excels in estimating unknown quantities, such as residual dynamics, internal disturbances, and external disturbances.A recent line of work 7 , 31 , 32 , 33 has been investigating how to incorporate Gaussian Process (GP) to learn system dynamics entirely or partially considering external disturbances, for example, wind 28 , from data, targeting applications such as reference trajectory tracking and reaching to a specified goal.The choice of GPs is preferred for system identification-related tasks, e.g., residual dynamics modelling, compared to more expressive modelling techniques such as neural networks in the recent past.GPs are preferred due to the modelling constraints for receding horizon planning, e.g., MPC and NMPC.It is sophisticated to develop the back-propagation algorithm 34 to estimate gradients of the cost function when formulating receding horizon planning problems within the optimization solver.However, for GPs, it is relatively easy to implement gradient estimation.Hence, variants of GPs are proposed for system identification-related tasks more often than the other techniques.
Instead of learning a complete dynamical model, the authors of 35 proposed to learn only time-varies state uncertainty using Gaussian belief propagation.Consequently, learning only the sub-portion of time-varies state or/and control uncertainty as the residual dynamics yields quite promising results 7 .The latter training technique requires less data to learn the residual dynamics.

3/17
Yet it is advisable to have more training data to improve the robustness to learn the residual dynamics.However, a large amount of training data is problematic when GP is applied for real-time applications.There are several ways to reduce the overhead that arises when utilizing a high volume of training data 36 : select a set of points, i.e, inducing points 37 , 38 , that approximate the original training data distribution; exploit the structure of the formation of GP 39 , 40 , 41 , e.g., Kronecker Structure 42 ; and use variational methods.Sparse GP 43 is such a variant that can be used in different ways to obtain inducing points: Subset of Regressors (SoR), Deterministic Training Conditional (DTC), Fully Independent Training Conditional (FITC) 38 , and Structured Kernel Interpolation (SKI) 39 .The complexity reduces from O(n 3 ) to O(nm 3 ), where n and m are the numbers of points (n > m) that are used to model GP and Sparse GP, respectively.Inducing point methods are quite appropriate for the real-time setting.However, model accuracy degrades as m decreases, which can be seen as a trade-off between model expressiveness reduction versus computational complexity.In this work, we present how to reduce residual dynamics that arise between high-level planning and low-level controlling by considering reference trajectory tracking as an example in the plan-based control paradigm (Table .1).The proposed approach is not limited to the considered example.It is valid for any motion planner which depends on a low-level controller.Depending on the representation of the motion model, the residual dynamics are defined by selecting a set of states and controlling input appropriately.The uniqueness of the proposed approach is that it is invariant to the geometry representation of the trajectory.Hence, once the residual dynamics model is trained, it can be deployed without retraining.The following sections explain the training process of such residual dynamics model and deploy it in hardware, which shows the effectiveness of incorporating residual dynamics modelling in high-level planning.
Our contributions are as follows: 1. To address the limitations of traditional motion models in capturing the actual system dynamics of aerial vehicles, we propose a hybrid approach that combines a simplified motion model for high-level planning with a learning-based technique, Sparse Gaussian Process, to estimate residual dynamics.This approach enables real-time performance by running the entire algorithm on an onboard computer.
2. To augment the replanner's 9 capability to circumvent local minima, we introduce a mechanism that dynamically regenerates the reference trajectory whenever the quadrotor's actual position deviates substantially from the initial reference trajectory.This enhancement effectively precludes the quadrotor from becoming ensnared in local minima during the receding horizon control process.

Methodology
This paper tackles the challenge of using a less accurate model (kinematic or dynamic) with Model Predictive Control (NMPC) by learning the residual errors between that approximated model and the actual model.Instead of a full 6-degree-of-freedom dynamic model, we employ a 4-degree-of-freedom model and propose a Sparse Gaussian Process technique to reduce the impact of dynamic factors that depend on the operating conditions and can hardly be modelled analytically in advance.Crucially, we aim to maintain trajectory planner performance, so we focus on learning the rate of velocity changes as residual dynamics.This choice is invariant to the trajectory's geometric representation, allowing offline learning of the residual dynamics distribution.By generating diverse trajectories, we capture the latent distribution of residual dynamics.We then augment the nominal model with this learned residual dynamics model and demonstrate it in online NMPC as a feedback controller, especially for the quadrotor in the plan-based robot control paradigm.
The plan-based robot control paradigm comprises two stages: planning and controlling.The planning stage can be a combination of several planners that run simultaneously or cascade planners 45 that run sequentially.In this work, the planning stage is formed by two planners: a replanner and a local planner.We have used the approach proposed in 9 as our replanner.
It executes continuous refinement of the provided reference trajectory out of obstacle regions.The proposed learning-based residual dynamics are augmented into a local planner 8 .Further, the subsections are organized as follows: formulating the local planner with residual dynamics in the Residual Augmented Quadrotor Motion Model; laying down training strategies of residual dynamics in the Nonlinear Model Predictive Control local planner, and estimating approximated distribution to represent the residual dynamics in Sparse GP Regression.

Residual Augmented Quadrotor Motion Model
The proposed residual dynamics learning framework can be integrated with various quadrotor kinematical and dynamical models when formulating NMPC.While the choice of model can significantly impact the effectiveness of the approach, it also introduces various challenges, including high computational demands, being stuck in local minima, low convergence rate, and real-time applicability.In NMPC, finding a delicate balance between avoiding local minima and managing computational costs is crucial.A longer prediction horizon helps evade local minima, but it also increases the number of optimization variables, demanding more computational resources.Several strategies exist to reduce these variables, but each has its own drawbacks.Increasing the discretization interval (δt): This relies heavily on the accuracy of the motion model.If the model is approximated, it can lead to unrealistic control commands.Increasing maximum control inputs (velocity, acceleration): this raises the risk of collisions with obstacles.Using high-degree-of-freedom models: if some degrees of freedom aren't fully utilized, unnecessary computational overhead is introduced.In this work, we chose a 4-degree-of-freedom model (f norm (x, u)) 8 , aligning with our focus on low-speed maneuvering.To compensate for potential inaccuracies, we employed Sparse Gaussian Regression to model residual dynamics.This approach strikes a balance between avoiding local minima and maintaining computational efficiency, demonstrating the intricate link between these two factors in NMPC.Let f est (x, u) be the augmented quadrotor motion model: where the state vector and the control inputs, defined x and u, respectively.Depending on which states and controls are learnt, B z gives the appropriate transformation, defined in the following sections.The Gaussian process was used to model the residual dynamics g(z) ∼ N(µ, Σ p ), z = [x; u], where Σ p is a i.i.d (independent and identically distributed) process noise and µ estimated mean value.The training data were centered, i.e., zero mean, before stating the learning process, which consists of only the diagonal covariance (or assuming no correlation between the two states of residual dynamics).The explicit Runge-Kutta 4th order algorithm was used to incorporate f est (x, u) motion model in the discrete-time f d (x k , u k , δt) with integration step size δt as where k depicts the time index.Since we rely on a less accurate model (or an approximate model) using the Runge-Kutta 4th order method 46 compared to the Eular, reduce the integration error of the discrete system dynamics.

Nonlinear Model Predictive Control Based Local Planner
To obtain augmented residual motion model f d (x, u, δt), the local planner calculates near-optimal control policy.However, the local planner does not guarantee optimality because NMPC-based planners calculate approximated control policy heuristically.
In complex and cluttered environments, generating a convex representation of free space is generally challenging.However, the incremental Euclidean Distance Transformation Map (EDTM) 8 , when applied along the reference trajectory, can provide accurate obstacle poses.These poses can be incorporated as hard constraints in the NMPC formulation, effectively addressing the non-linearity of obstacle constraints.NMPC is preferred over linear or convex MPC due to its ability to generate feasible solutions even when dealing with non-linear constraints.However, NMPC is susceptible to getting stuck in local minima, unlike convex MPC which guarantees global optimality.The following section outlines a proposed method to mitigate this issue.For a specified time index k, a corresponding slice of the reference trajectory [x re f k , ..., x re f k+N , u re f k , ..., u re f k+N ] and the current state x k , and close-in obstacles g 2 (w) information are available.The local planner, which was designed using multiple shooting technique, generates near-optimal control policy w = [ ūk , . . ., ūk+N−1 , xk , . . ., xk+N ] as follows: near-optimal control inputs and state after applying NMPC, and actual system state after applying estimated control inputs, respectively.We used the same PD regulator that was proposed in 8 .This work focuses on the DJI M100 quadrotor as the representative real system.
where the performance index (or objective function) J(w) aims to minimize the proportional error xk+i − x re f k+i and the control effect ūk+i − u re f k+i by employing a quadratic cost function.The continuous and differentiable nature of quadratic functions facilitates smooth optimization, enabling efficient solution via sequential quadratic programming (SQP) 47 .The prediction horizon length of NMPC, denoted N.Both maximum and minimum pairs of velocities and positions, denoted v max , x max and v min , x min , respectively.Both the Q and R matrices are positive semi-definite and positive definite, respectively.These matrices govern the importance of state variables and control inputs in the cost function, J(w).To obtain the discrete dynamical model, the Runge-Kutta 4th order algorithm is applied to the quadrotor's motion dynamics, resulting in the discrete dynamical model where δt is the time step.The system dynamic constraints are formulated based on the given discrete dynamical model f d (eq.2) 8 .
At each time step k, the desired reference pose and control are denoted by x re f k and u re f k , respectively.The reference trajectory, which provides the desired poses and control inputs, can be represented as a dth-degree polynomial function of time.Consequently, desired derivatives up to order d − 1 can be obtained at each time step k.Cubic uniform B-spline was employed to define the initial reference trajectory, which was subsequently refined using the global planner 9 .The obstacle constraints were formulated as a set of nonlinear hard constraints, g 2 (w), as follows: where N o is the number of obstacles at time k, and dis( , where d o is the safe zone distance between a quadrotor and close-in obstacles, denoted by x o j ∈ R 3 , j = 1, .., N o .For the descriptive formulations of g 1 (w), g 2 (w), x re f k , and u re f k , refers to 8,9 .

Augmented Residual Dynamics Learning with Gaussian Process(GP)
To learn residual dynamics g(z) eq.( 1), the first step is to collect a training dataset.As shown in Fig. 2, generate a set of random reference trajectories and formulate the local planner eq.( 2) considering only f norm (x, u).Let training dataset D be {X, y}, where X = {(x i , ūi , xi , xi )} n i=0 and y = {y i } n i=0 , i = 0, ..., n.Terms x i , ūi , xi and xi are current state, (predicted) control input and state using the local planner, and actual states after applying ūi to the system, respectively, at time t i .Residual dynamics y i at t i is determined by Thus, given a training dataset D, Gaussian Process is employed to learn g(z i ).Gaussian Process, in general, can be used to infer a distribution (prior) over function directly.Hence, residual dynamics can be determined as follows: where ε i is the corresponding n white noise realization σ n .Given an unseen x * data points ( or test dataset), GP, which is a way to estimate the posterior predictive distribution, estimates corresponding residual dynamics y * , that can be formulated as follows: where A = σ −2 n XX ⊤ + Σ −1 p and term s, denoted the prior distribution of the residual dynamics.Nonetheless, linear models, e.g., eq.( 6), suffer severely from limited expressiveness.Various methods can be used to transform input space (x i ) into different feature spaces using basis functions; however, the model (s) is still linear in parameters, which is analytically tractable.Let Φ( * ) be such a basic function, where * denotes input space.Along with that, posterior predictive distribution eq.( 7) becomes: This posterior predictive distribution can be reformulated as p(y * |x * , X, y) Thus, the entries of these matrices have invariable form Φ(x µ1 ) ⊤ Σ p Φ(x µ2 ), where x µ1 and x µ2 come either from training or testing sets.Let k(x µ1 , x µ2 ) = Φ(x µ1 ) ⊤ Σ p Φ(x µ2 ).Since Σ p is positive definite, we can define Σ 1/2 p .Moreover, Σ 1/2 p = UD 1/2 U ⊤ (using Singular Value Decomposition (SVD)).Thus, k(x µ1 , x µ2 ) = ψ(x µ1 ) • ψ(x µ2 ), where ψ(x µ1 ) = Σ 1/2 p Φ(x µ1 ).If x µ1 and x µ2 are similar, kernel k(x µ1 , x µ2 ) at these points, i.e., f (x µ1 ) and f (x µ2 ), must have similar values.
Since data centring is, in general, applied before applying regression, a GP is completely specified by its co-variance function k(x µ1 , x µ2 ) for a real process g(x) (or g(z), where z = [x; u]).We have used only x as the input for GP eq.( 5).Hence, we neglected u from the formulation).The g(x * ) ∼ GP(m(x), cov(x, x * )), where terms m , denoted mean for training data, mean for testing data and covariance between training and testing data, respectively.Since it is impossible to estimate k(x µ1 , x µ2 ) analytically, an approximated kernel function k, e.g., square exponential (SE) 48 , is required to estimate Σ p is defined as k , where σ f and l are hyper-parameters that have to learn; they are unique for the selected kernel function.Along with that, GP is formed to estimate y * analytically given x * as follows: Thus, using eq.( 8), the mean and covariance of the testing dataset can be fully determined as m(

Sparse GP Regression
Let y i = g i + ε i , g i = g(x i ), where g = {g i } n i=1 are the latent function values.Once GP is estimated, the prior p(g) can be determined.Therefore, the joint probability model p(y, g) = p(y|g)p(g) can be estimated.Sparse GP Regression aims to minimize the distance between the exact Gaussian distribution and a proposed posterior Gaussian distribution using variational approximation in which the number of training samples (m) is less than the number of training samples (n) for exact GP .Let p(h|y) = p(h|g)p(g|y)dg be such a posterior distribution, where p(h|g) denotes the prior conditional probability on the selected a set of auxiliary inducing points x m and h = g j + ε j , ε j ∼ N(0, σ m ), j = {0, ..., m}.ε j is the corresponding m white noise realization σ m .After selecting such points, g m be the approximated GP model.Therefore, p(h|y) can be reformulated as: p(h|y) = p(h|g m , g)p(g|g m , y)p(g m |y)dgdg m .Suppose h and g are conditionally independent given g m , i.e., p(h|g m , g) = p(h|g m ) or g m is sufficient to describe the distribution g: where Φ(g m ) = p(g m |y) ∼ N(µ, A).With that, mean and covariance of the approximated posterior GP are determined as respectively.Inducing points x m are estimated to reduce the distance between approximated and actual Gaussian distributions.The Kullback-Leibler (KL) divergence can be used to minimize the distance between the approximated posterior q(h) and the exact posterior p(g|y).The optimal estimation q * (h) = argmin (g m ) KL[q h; Φ(g m ) ||p(h|y)].Using Bayesian inference, i.e., p(h|y) = p(h, y)/p(y), KL divergence can be expanded in the following way: However, p(y) is an intractable integral, yet p(y) is independent of q(h; Φ(g m )).Thus, by maximizing the first two terms (evidence lower bound ELBO(q) = L(µ, A, x m )): true log marginal likelihood log(p(y)) can be calculated.Thus, relationship between ELBO(q) and KL[q(h; Φ(g m ))||p(h|y)] can be determined considering eq.( 10) and eq.( 11) as ELBO(q) = E[log p(y|h) ] − KL[q h; Φ(g m ) ||p(h)].Similarly, log marginal likelihood log(p(y)) can also be determined by log(p(y)) = ELBO(q) + KL[q h; Φ(g m ) ||p(h|y)].Since KL[q(h; Φ(g m ))||p(h|y)] ≥ 0 for any true distribution, i.e.KL is a distance, log(p(y)) ≥ ELBO(q) should hold.Thus, this inequality maximizes the ELBO on h, which eventually approximates the actual log-marginal likelihood as follows, which gives the optimal inducing points to represent the approximated Gaussian distribution.
Once we obtained the optimal values for x m , µ and A, which are the parameters of Φ(g m ), can be determined as respectively.After selecting the inducing points x m , only those points are considered to formulate the g(z) (eq.1).Afterwards, NMPC is formulated (eq.2) with an augmented residual dynamics model.

Experimental procedure and results
To assess the improvement of incorporating the proposed GP-based residual dynamics modelling into the local planner, we conducted several simulated and real-world experiments.For real-world experiments, the DJI M100 quadrotor was employed.
The developed software stack is built on the ROS1 framework and implemented in C++.Since the local planner generates velocity and angular velocity commands, we employed the velocity control interface provided by the DJI Onboard-SDK-ROS 23 .DJI control interface can handle control commands at a frequency of 20 Hz or higher.To bridge the gap between the DJI quadrotor and the local planner, we implemented a PD regulator 8 that ensures smooth command transmission at 30 Hz. Variable y y , defined the rate of the different between predicted velocity vy and actual velocity vy .Inducing points provide an approximation for y y .A Gaussian process is formed using such inducing points, i.e., sparse Gaussian process, which can be used to predict residual dynamics for corresponding input velocity v y .

Collecting testing dataset
Initially, twenty distinct trajectories were generated, as shown in Fig. 4. The objective was to collect a testing dataset in two forms: with and without obstacles.The testing dataset is structured as follows, which is similar to the training dataset: n, where n represents the number of data points.In this structure: vi , vi , y i ∈ R 3 and δ i ∈ R represent the estimated velocity by NMPC, the actual velocity after applying the estimated velocity, the velocity residual, and the time difference between two consecutive data points, respectively.The terms vi , vi , and y i are expressed as: vµ , vµ , y µ , µ ∈ {x, y, z}, where y µ and v µ denote the expected residual dynamics and velocity component in the µ direction, respectively.These values are calculated at time δ i when using only the nominal model (f norm ).For simplicity, δ i is defined as δ v .Accordingly, the residual dynamics of the actual system are given by: The data were acquired using both a PX4-enabled quadrotor in a Gazebo-based simulated environment and a DJI M100 quadrotor (Fig. 4) in real-world conditions.Since residual dynamics is invariant to the geometric representation of trajectory but variant to velocity Additionally, since we used two different quadrotors for simulated and real-world experiments, residual dynamics learning was carried out separately for both cases.However, for simplicity, we explained the results and finding considering the real-world setup.

Modeling residual dynamics distribution
In cluttered environments, the desired velocity is automatically reduced due to obstacle constraints and computational limitations, both of which are addressed by the local planner.Consequently, in such environments, accurate residual dynamics estimation is more crucial for lower velocities than for higher velocities.This is because the velocity profile typically falls within a lower range, such as 0 to 2 meters per second, particularly in cluttered environments with limited sensing capabilities.Hence, g(z)(eq.5) was modelled such that it could generate residual dynamics with low uncertainty for such velocities, i.e., by selecting more inducing points closer to lower velocity regions.We have compared two inducing points selection approaches, i.e., the proposed approach and the cluster selection approach 7 , for capturing residual dynamics along each axis: y x , y y , and y z ).The

9/17
Figure 5.The relationship between residual dynamics y y and input velocity v y .In left sub-figure: the first plot displays how y y and v y vary over time, while the second plot shows the approximation of the y y using two different approaches: Sparse Gaussian Process-based (proposed) and cluster selection approach 7 , in right sub-figure: residual dynamics before and after incorporating the learnt Sparse Gaussian process.Root Mean Square Error drops from 0.4012 to 0.0751 after introducing the residual dynamics into the nominal motion model results of our experiments are given concerning the y axis, i.e., y y .However, we applied the same procedure to other axes as well.The latter approach targets high-speed maneuvering provided that inducing points should capture the whole distribution, which is shown in Fig. 5 in green dots.In contrast, the proposed approach selects more inducing points (pink triangles) towards lower velocity ranges.

Assessing the effect of residual dynamics produces on the nominal dynamics
To assess which effect residual dynamics produces on the nominal dynamics eq.( 1), we used the testing dataset that was acquired without considering obstacles ({y µ , v µ , δ v }).Let ȳµ = g(v µ ) be the expected residual dynamics.Then, the nominal error and the augmented residual error are fully determined by y µ • δ v and (y µ − ȳµ ) • δ v .We observed, that the accuracy of the augmented residual dynamics model has increased drastically compared to the standalone nominal model (Fig. 5).Such an improvement helps to significantly reduce the trajectory tracking error since the control policy that is generated by NMPC after incorporating augmented residual dynamics can cope with the mean of epistemic uncertainty, residual dynamics, robustly.

Selecting an optimal number of inducing points
The execution time that is taken to estimate control policy is generated based on NMPC for maneuvering the quadrotor.This time is crucial for a smooth flight experience.We have experimented to find the correlation among the selected number of inducing points, NMPC execution time on two different embedded devices, and model accuracy (Fig. 6).As a result, we have selected only 30 inducing points to represent the residual dynamics.Such many inducing points were selected mainly due to two reasons.First, the accuracy improvement rate is almost negligible compared to NMPC execution time which grows Figure 6.Correlation between the number of inducing points and computational power.The number of inducing points is inversely proportional to augmented dynamics residual RMSE, yet proportional to computational power for two embedded computers: Nvidia Xavier and Nvidia Xavier NX.Since RMSE shows no significant change compared to computation time, we selected 30 as the number of inducing points for formulating the Sparse Gaussian Process exponentially when the number of inducing points is increased.Second, the Nvidia Xavier NX embedded computer was used for onboard computation.Moreover, there is no considerable advantage to using Nvidia Xavier over Nvidia Xavier NX since 10/17 the optimization problem, i.e., local planner, is solved sequentially.The residual dynamics model learning is performed as an offline process, ensuring it does not impact real-time performance considerably.Therefore, the Graphics Processing Unit (GPU) is only utilized during the learning stage.Once the residual dynamics are estimated as a latent distribution, the model parameters are incorporated into the nominal model (eq.2).We employed Casadi 49 with IPOPT 50 as the primary solver for the local planner.Therefore, employing a large training dataset that accurately captures the residual dynamics distribution and carefully selecting optimal inducing points to represent the residual dynamics distribution effectively minimizes the residual dynamics of the model while having a minimum burden on computational cost.
Estimating root mean square error considering with and without obstacles Furthermore, we have calculated root mean square error (RMSE) with and without considering obstacles (Table .2) in a simulated environment.The control policy generation can differ when considering obstacle constraints because NMPC can generate different control policies over time for different obstacle constraints.Thus, we kept the same initial reference trajectory running the planner with and without considering obstacles and then estimated the RMSE of residual dynamics.RMSE is Such behaviour is mainly due to the lack of stability of the global map, which is built incrementally as a result of updates within a small range, e.g., 5m from the quadrotor pose.Hence, the replanner iteratively refines the trajectory when the global map is changed.Thus, the local planner has to incorporate more obstacles and constraints from time to time.Such behaviour can lead to the trap of the quadrotor in local minima.None of such problems exists in obstacle-free environments.

Avoiding trapping the quadrotor in local minima
The initial reference trajectory can be within the obstacle zones since the quadrotor has no prior knowledge about the environment beyond the sensing range.Hence, the quadrotor can be far from the reference trajectory in the presence of a huge obstacle through the reference trajectory (Fig. 7).In such a situation, the replanner 9 fails to refine the trajectory due to time constraints.To overcome this issue, we proposed to regenerate a reference trajectory provided that the replanner can push the reference trajectory further away from the obstacles.In Fig. 7, one such experiment shows how the proposed approach worked in real-world conditions.

Experimenting in real-world conditions
To evaluate the proposed approach, we conducted two real-world experiments.In the first experiment, a set of twenty initial reference trajectories (Fig. 8) was generated for traversing an obstacle-free environment with a length range of 40 to 60 meters.
The trajectories varied in their geometrical representations, while the maximum allowable velocity was set to 1.5 m/s.We first executed the experiment without considering residual dynamic estimation in the nominal model.Then, we repeated the experiment after incorporating residual dynamics.Both experiments were repeated three times for each trajectory to calculate the average RMSE of the augmented residual dynamics.Compared to the provided initial reference trajectory, the average RMSE for trajectories with and without augmented residual dynamics were 0.0851 and 0.2681, respectively.This indicates that incorporating residual dynamics reduced the residual dynamics error by almost half.The simulated experiment results (Table .2) and real-world results show similar RMSE in obstacle-free environments.In the second experiment, we only compared the RMSE of the trajectories after incorporating residual dynamics due to the dynamic nature of an environment with obstacles.The initial reference trajectory passes through a cluttered environment (Fig. 9, which is approximately 60m long.The traversed trajectory distance is approximately 80m.The local planner we selected Figure 9. Experimental results for the proposed approach in a real-world setting.The initial reference trajectory within the obstacles is pushed iteratively.The distance between the refined trajectory and its closest obstacle poses is 1.5m.We set a 1.5 m distance between the refined trajectory (traversed trajectory) and the quadrotor pose.Due to computational constraints, the map of the environment is built 5m from the quadrotor pose incrementally, though LiDAR can measure up to 100m.Sub-figure (b) depicts the velocity profile for the experiment.Due to our reliance on GPS for quadrotor height estimation, the velocity along the z-axis exhibited significant instability compared to the other two axes.
was specifically designed for low-speed maneuvering, as the primary objective was to navigate in cluttered environments.Consequently, we set the maximum velocity to 1.5 m/s.However, this restriction is not inherent to the proposed residual dynamics modeling approach.The initial reference trajectory was regenerated within the said traversed trajectory distance due to the local minima.Such an occasion can be seen in the provided video 51 .The RMSE of the augmented residual dynamics is 0.1018 (m/s).Even though it is hard to estimate the trajectory smoothness quantitatively, i.e., to assess control policy generation after introducing augmented residual dynamics, the smoothness has improved after incorporating augmented residual dynamics into the nominal model (Table .2).

Comparing with other similar recent works
We compared the proposed approach with four different trajectory planners: sampling-based grid search followed by trajectory generation, NMPC-based replanner, hybrid planner (local and global), and KinoJGM.All the selected planners have their advantages and limitations with respect to low-speed maneuvers.The matrices that we considered are as follows: (1) mean computation time (MCT) (execution time for one iteration of the planner), (2) success rate (SR) (the number of times quadrotor reaches from start to goal without any collision), and (3) traversed distance (TD) (total distance of quadrotor that is traversed).
In this experiment, we use a random forest with a density of (60m×60m×10m).After that, we selected 10 different start and target poses and executed each planner.The experiment results are given in Table .3. For the sampling-based planner, MCT is high because it generates nondeterministic trajectories due to the nature of the planner.Replanner generates locally near-optimal control policies, yet execution time is high due to design constraints, i.e., the local planner is formulated as a Compared to the listed two planners, the hybrid approach has high MCT and SR.The latter planner executes two sub-planners, local and global, simultaneously.Hence, the planner observes the environment in the distance and generates a near-optimal control policy to reach the goal.However, since a simplified motion model was used to generate near-optimal control policy, dynamics uncertainty (or residual dynamics) between desired control that the planner generated and the actual control that the quadrotor obtained is considerably high.Such behaviour leads to trapping the quadrotor in local minima.The proposed approach is a modification of this planner by addressing two issues.Traps in local minima and residual dynamics are the two issues that the proposed planner addressed.Resolving those issues helps to obtain a low TD compared to the hybrid planner and a slightly higher SR.In addition, the proposed planner was compared with the KinoJGM planner and its variants: GP-MPC and KinoJSS.Those results were provided by the authors of KinoJGM.GP-MPC planner is conceptually similar to the proposed one.However, it has no recovery mechanism and ways to avoid local minima unlike ours.These are the reasons the proposed planner achieved higher SR compared to GP-MPC.MCT of the proposed is slightly lower compared to GP-MPC.However, the obtained MCT is acceptable for low-speed maneuvers.

Discussion
The trajectory planning problems in the plan-based control paradigm, in general, are solved adhering to these steps: path search 52 , initial trajectory generation and trajectory refinement 53 , high-level control command generation that can be achieved by several approaches, including differential flatness mapping 54 , receding horizon planning, and finally low-level control commands generation using a flight controller, for example, PX4, DJI.These flight controllers operate independently irrespective of high-level planners.Moreover, due to their independence, such controllers reduce the overhead and complexity of developing high-level planning algorithms.In other words, the same planner can be deployed on different firmware by implementing an interface between a high-level planner and a low-level controller.Therefore, residual dynamics arise between the high-level planner and the low-level controller.Gaussian Process-based techniques are used to model these residual dynamics in the recent past not only for MAVs but also other types of vehicles 55 , 56 .
In recent years, MAVs-related manifestations, e.g., trajectory tracking, exploration, expanding the range for venturing out computationally expensive techniques in various disciplines, including agriculture, aerial photography, and crop monitoring 57 , 58 .Low-speed maneuvering, in general, is preferred over high-speed maneuvering in executing such demanding tasks due to task complexity.An example of such low-speed maneuver need is trajectory planning, where the environment is obstacle-rich.Therefore, the kinematic modelling of a quadrotor was considered since the scope of this work is for low-speed maneuvers without consideration of dynamic effects (external or/and internal).Similar assumptions were used in 59 for weed detection in crops, which showed the necessity of low-speed maneuvering.On the other hand, several studies have been carried out for highspeed maneuvers 4 , 5 , 6 with consideration of residual dynamics estimation, while most of them struggle with high-computational demands for on-board processing and difficulty in maneuvering in cluttered environments.The proposed framework can be adapted for high-speed maneuvers as well.However, it requires adding a behavioural planner that switches between planners according to the requirements, for example, in the cluttering environment we can switch to the low maneuvers while keeping the same control command generation.That will give us several benefits compared to existing solutions, e.g., residual dynamics models can be trained separately for low-speed maneuvers and high-speed maneuvers.Hence, the proposed solution can be adapted to general control with high and low-speed maneuvers introducing two or more operation modes and switching between them based on the external conditions.
Since Sparse Gaussian Processing is a non-parametric model, it does not require any parameter tuning.However, it is necessary to provide a proper training set that can capture the whole distribution of the latent space, i.e., residual dynamics.The simulated experiments of the proposed approach were carried out with an IRIS quadrotor with a PX4 flight controller.The real-world experiments were carried out with a DJI M100 with an A3 flight controller.In both cases, the initial step was to fine-tune the hyper-parameters of the PD regulator.This process is required before running the proposed motion planner.Suppose the proposed approach is deployed in another MAV.In that case, it is required to fine-turn the listed parameters since the high-level planner does not know any information about the low-level controller.However, this is a one-time process.Recently, auto PID tuning approaches were proposed 60 .Hence, it is required to investigate these aspects to improve the parameter tuning of the proposed approach.
The source code and complete experiments are available at Github 51 p k the robot's current position p µ k , µ ∈ {x, y, z} p k the robot's current velocity v µ k , µ ∈ {x, y, z} u re f desired reference robot control input at time step k u k the robot's current control inputs u k ∈ R n u , influencing its linear velocity v k ∈ R 3 in meters per second and angular velocity ω ūk the estimated control input at time step k by NMPC ûk actual control input at time step k after applying the current control ( ūk ) to the robot 1 arXiv:2305.15791v2[cs.RO] 22 Jan 2024 w the estimated state trajectories and corresponding control inputs ([ ūk , . . ., ūk+N−1 , xk , . . ., xk+N ]) estimated from the NMPC

Figure 2 .
Figure2.The high-level overview of the proposed framework for learning the residual dynamics.Design matrix B z defines which states and control inputs should be trained.Learned residual dynamics g(z k+i−1 ) is added to nominal dynamics f norm (x k+i−1 , u k+i−1 ) when formulating NMPC by using the multiple shooting technique.In the training stage, collect the data D = {X, y} = {(x i , ūi , xi , xi ), y i }, i = 0, ..., n for n number of times, where (x i , ūi , xi , and xi ) denote current state, estimated near-optimal control inputs and state after applying NMPC, and actual system state after applying estimated control inputs, respectively.We used the same PD regulator that was proposed in8 .This work focuses on the DJI M100 quadrotor as the representative real system.

Figure 3 .
Figure3.Epistemic uncertainty propagation of residual dynamics y y for changing of input velocity v y , where subscript y indicates the y axis.Variable y y , defined the rate of the different between predicted velocity vy and actual velocity vy .Inducing points provide an approximation for y y .A Gaussian process is formed using such inducing points, i.e., sparse Gaussian process, which can be used to predict residual dynamics for corresponding input velocity v y .

Figure 4 .
Figure 4.The right sub-figure: A trajectory of motion along all the axes was used to collect the data for learning and testing the latent representation of residual dynamics (Fig. 3).The left sub-figure: DJI M100 quadrotor is used for real-world experiments and PX4-enabled quadrotor is used for simulated environments

Figure 7 .
Figure 7.When the initial reference trajectory and the current pose of the quadrotor are far away from each other (left-side figure), e.g., 2m, regenerate the reference trajectory such that the replanner can try to push the reference trajectory further (the right-side figure)

Figure 8 .
Figure 8. Example trajectory that is used to estimate the average RMSE of the augmented residual dynamics

Table 1 .
Comparative analysis of Gaussian Process-based residual dynamics learning techniques with the proposed technique

Table 2 .
Comparison of how different types of inducing points selection affect the nominal model error augmented residual dynamics model is applied in obstacle-free environments, compared to obstacle-cluttered environments.

Table 3 .
Comparative analysis in terms of goal-reaching.Ten different start and goal poses were considered whilst keeping the same environment 26: Success Fraction, MCT: Mean Computation Time, P t n : MPC prediction horizon length, u r : map update range incrementally, x m : number of inducing points, s t : sampling time, KinoJSS; GP-MPC; KinoJGM are three different variants of26constraint nonlinear optimization problem.Moreover, the quadrotor can trap in local minima since the planner optimizes locally.