Fusion localization for indoor airplane inspection using visual inertial odometry and ultrasonic RTLS

In this paper, the fusion localization system for the visual inertial odometry (VIO) and ultrasonic real-time localization system (RTLS) for indoor airplane inspection using drones is proposed. In a hangar environment, either trilateration-based RTLS or vision-based localization shows disadvantages and neither can be used alone. In this research, we design a configuration of VIO suitable for hangar environment and outlier filter on ultrasonic RTLS for non-line of sight situations, so that both can be fused using graph optimization. The proposed solution can provide more accurate localization than the visual odometry-only system as well as continue estimating positions in the absence of RTLS data. Localization and real-time performance of the proposed algorithm are evaluated through experimentation in a hangar and a flight test in an outdoor space.

As for RTLS usage, Wi-Fi trilateration for warehouse localization is evaluated and tested in 6 .It is important to note that the author mentions that Wi-Fi signal attenuation and reflections affect the accuracy.Therefore, we aim to propose a fusion localization system that combines drone onboard odometry with ultrasonic or UWB-based RTLS to compensate for error accumulation.
As for research on localization in a hangar for airplane inspection, Pugliese et al. 7 proposed LIDAR-based localization taking advantage of geometric knowledge of the target airplane's fuselage.However, the algorithm used in this proposed methodology cannot produce three-dimensional Euclidean coordinates; it only provides the distance information from an airplane fuselage cylinder axis to a drone.Tappe et al. 8 implemented an indoor hangar localization by LIDAR-based LoLa SLAM (Low-Latency SLAM) 9 .However, the authors also commented that further research on fusing with another sensor could be helpful as a laser has drawbacks in measuring range on some materials such as glass or mirrors.
In this paper, we propose the fusion of an ultrasonic trilateration-based RTLS with the visual inertial odometry (VIO) within a hanger.We configure the VIO system for hangar use, including camera type, camera direction, and visual odometry (VO) algorithm.To handle non-line-of-sight (non-LOS) situations, we design an RTLS outlier filter that uses VIO as a reference.Localization from VIO and outlier-filtered RTLS is fused using a graph optimization algorithm.The contributions of this research are as follows: • Accumulated drift error of VIO is compensated by RTLS • In the absence of RTLS data, localization can continue by VIO • VIO configuration suitable for a hangar is proposed • An outlier filter to reject abnormal values from RTLS is designed • Real-time performance is achieved and tested through a flight test The organization of this paper is as follows: "Localization system design for an airplane inspection drone" section proposes a fusion localization system that combines VIO and ultrasonic RTLS for airplane inspection drones."Experiment" section describes the hangar tests (Fig. 1) and the outdoor flight tests for the suggested fusion localization system."Result" section shows that the suggested solution can continue localization even when ultrasonic RTLS positions are abnormal."Result" section also shows that fusion with the ultrasonic RTLS has greater accuracy than VIO alone, and its capability to perform a real-time flight.

Fusion system architecture
Ultrasonic RTLS systems offer precision within a few centimeters, which is sufficient for indoor drone navigation 10 .However, when drones fly close to airplanes, ultrasonic waves can be disrupted or reflected by an airplane's fuselage or wings, resulting in unstable RTLS localization.This necessitates the use of secondary localization.
As for VO-based localization, their pose output period is faster than ultrasonic RTLS and the calculation results are more precise in a short time duration.In a hangar environment, however, VO suffers from a lack of feature points due to the uniform color painting on the airplane's surfaces and the insufficient presence of objects such as maintenance tools and equipment on a hangar floor.In addition, VO needs to eliminate the drift error, which accumulates gradually, using a compensation algorithm such as loop closure, which occurs when the same place is visited again 3 .However, loop closure may not occur in an automated inspection flight path, so another method is necessary to compensate for VO's error in airplane inspection applications.
Therefore, to take advantage of both RTLS and visual odometry, we propose a fusion localization system that combines ultrasonic RTLS and stereo camera VIO, using an outlier filter and graph-based optimization as described in Fig. 2. Regarding VIO, we implement buffer management on images and features to ensure real-time capability.We opt for an optical flow-based VIO algorithm due to its high localization accuracy in the context of airplane inspection missions.Detailed VIO configuration adequate for airplane inspection application in a Figure 1.Test environment in a hangar.hangar is described in "VIO configuration" section.As for ultrasonic RTLS, we address unstable position data in the case of non-LOS and ultrasonic reflection through the outlier filter which is described in "Ultrasonic RTLS" section.Graph optimization design is explained in "Graph optimization fusion" section.When formulating a graph consisting of RTLS and VIO, we apply size marginalization by limiting the number of graph nodes and edges (Fig. 2) to maintain real-time computational capabilities.

VIO configuration
VIO configuration for localization in a hangar environment is analyzed and selected with respect to camera direction, camera type, and visual odometry algorithm.An inspection drone equipped with a VIO localization system has two types of cameras: a VIO camera and an inspection camera.The VIO camera uses a global shutter image sensor to capture moving objects without distortion, while the inspection camera uses a rolling shutter image sensor to capture high-resolution still images.Because this research focuses on a VIO configuration, the term "camera" refers to the VIO camera.

Camera direction
Regarding camera direction, a camera direction configuration turning away from the airplane yields the best VO performance.In airplane inspection missions, inspection drones typically fly at a distance of 2-3 m from an airplane's body.As a result, if a camera is directed toward the airplane, the field of view (FOV) of a camera is blocked by the airplane's body, causing a lack of feature points for VO calculation.In addition, lights from the hangar ceiling can be reflected by the surface of an airplane body, resulting in distortions in the vision-based pose estimation.The images in Fig. 3a and c show detected feature points, while Fig. 3b and d display the resulting trajectories obtained using stereo images when a drone hanging from a hoist moves closely around a target  www.nature.com/scientificreports/airplane as in Fig. 1a.It can be seen in Fig. 3a that feature points are sparse on the airplane fuselage and light is reflected on the upper body of an airplane when a direction configuration is toward an airplane.At around 5 m in height, the light reflection on the upper body of the airplane causes the estimated trajectory to become distorted with a curved line when it should be straight (Fig. 3b).
On the other hand, when the camera is turned away from the airplane, thanks to the clear FOV, more feature points are visible on hangar walls, other aircraft, and maintenance objects on the hangar floor (Fig. 3c).As seen in Fig. 3d, the trajectory is well estimated due to the additional feature points and exhibits no curved lines.It is important to note that if there are only a few objects on a floor within the view of the camera, VO may have to rely solely on hangar wall features, which could lead to a pose estimation inaccuracy.In such cases, fusion with RTLS for compensation of VO drift is crucial.
The last direction configuration that can be considered is toward the ceiling of the hangar.Although the ceiling has plenty of feature points on the beam structures as can be seen in Fig. 3a, high ceiling heights (e.g., 20 m) are prone to scale error in VO calculation.Thus, using the ceiling view for camera direction configuration may not be appropriate, leading to our decision to use a camera direction turned away from the airplane.

Camera type
Comparing Red Green Blue-Depth (RGB-D), mono, and stereo cameras for VO, the stereo camera is the most suitable camera type for airplane drone inspection because stereo VO can utilize the view turned away from an airplane and recover the scale in translation calculation.
Although a RGB-D camera has the advantage of providing depth information, it has a limited effective range.Due to the range constraints, the camera view should be directed toward the aircraft, leading to feature point deficiencies in VO.For evaluation of RGB-D-based VO, a drone flight was conducted while fixing its camera view to a Boeing 727-200S airplane installed outdoors.RGB-D data were acquired by onboard Intel Realsense D435i which has an effective range limit of around 5 m (Fig. 4a,b).The results of the RGB-D VO shows that it can continue to track features located on windows in the fuselage part despite not having enough features (Fig. 4c,d).However, in the tail part where there are no windows, VO loses its tracking due to a lack of feature points as shown in Fig. 4e and f.
Mono and stereo cameras can both be considered for VO when capturing images while facing away from an airplane.Stereo cameras are more suitable for VO due to their capability to recover scale in translation, which mono cameras lack.Evaluation of various Stereo VO algorithms is detailed in "Visual odometry algorithm" section.As for a feature-based visual localization algorithm, ORB-SLAM2 (VO) using stereo images and ORB-SLAM3 (VIO) 13 using stereo images and inertial measurement unit (IMU) data are tested.As for an optical flow-based visual localization algorithm, VINS-Fusion 14 VO and VINS-Fusion VIO are evaluated.
In the test environment of Fig. 1b, a drone was moved by walking at a constant speed along the reference trajectory A shown in Fig. 6a.The reference trajectory followed lines on the floor of the hangar, and its length was measured using a laser ranger.It had a rectangular shape with dimensions of 16.2 by 7.2 m.The total distance covered by the drone during the test was 46.8 m.During the test, stereo images were acquired using Intel Realsense D435i, and inertial data were collected from the IMU sensor in Pixhawk FCC (flight control computer).The trajectory estimations of the candidate VO and VIO algorithms were performed from the data.The corner points from both the rectangular reference positions and the estimated positions were matched and synchronized.The positions on the side of the rectangular trajectory were sampled using linear interpolation because the drone was moved at a constant speed.Root mean square error (RMSE) on horizontal coordinates was calculated for each candidate VO and VIO algorithm using the sampled reference and estimated positions.The RMSE results were then compared to determine the relatively best algorithm, rather than an absolute evaluation.
Figure 5 and Table 1 show that among candidate algorithms, VINS-Fusion VIO, which is an optical flowbased algorithm, exhibits the smallest RMSE.The reason for VINS-Fusion's high performance is that it uses a sparse optical flow-based Lucas-Kanade algorithm 15 , which is suitable for slow movement during airplane inspection flights.Small position deviations by slow movement are crucial in optical flow because they satisfy Taylor series approximation in optical flow algorithms and can help to avoid the aliasing problem associated with sparse optical flow 16 .
Optical flow estimates the motion vector of points using the brightness constancy condition of pixels within consecutive images.Considering pixel brightness intensity I x, y, t which is located in x, y in an image at time t , brightness constancy condition is expressed as Assuming motion between consecutive images is small, Taylor series approximation can be applied, Then, optical flow can be derived as below  u, v mean the optical flow of I x, y, t , and I x , I y , I t denote spatial and time derivatives of I x, y, t .
In conclusion, we choose an optical flow-based VIO algorithm based on its superior performance in inspection flights.This choice was made as optical flow-based VO outperforms feature-based VO, and VIO offers greater accuracy than VO due to its tight coupling with the IMU.

Real-time implementation
To achieve real-time performance in VIO, marginalization on images and features is implemented as shown in Fig. 2, and GPU resources are utilized for feature tracking.Since bundle adjustment in VIO demands substantial computing power, the target board for this research, the Jetson Xavier NX, occasionally struggles to process all the features arriving at 15 Hz.If the delayed images and feature data pile up, the time offset between real motion and VIO output increases.Therefore, an image and feature buffer marginalization logic is implemented to prevent a delay between real motion and VIO output by discarding old images and features when new ones arrive before old ones have been processed.In addition, to alleviate the high processing load on the CPU which has limited resources, and improve overall speed, we employ GPU acceleration in the feature tracking process 17 .

Ultrasonic RTLS
Ultrasonic beacon indoor GPS from Marvelmind 10 is used in a hangar RTLS owing to its small and light SWaP.As in Fig. 6a, a system suite for each flight area comprises one rover attached to a drone and four stations installed on the hangar floor.Because no beacon stations for vertical position measurement are installed in the test, only planar X and Y positions from ultrasonic RTLS localization are utilized, while vertical Z positions are not used.
In the trajectory A test (Fig. 6b), which was conducted without non-LOS conditions as shown in Fig. 1b, RTLS provides normal localization values.However, in the trajectory B test (Fig. 6c), RTLS localization is momentarily unstable due to non-LOS, caused by the crane and the drone installation setup on the crane as shown in Fig. 1a.
To address this issue, we've implemented an outlier filter (Table 2) that utilizes VIO position output as a reference.This filter rejects abnormal position deviations and repeated same position data (Fig. 6c).The outlier filter takes the inputs of p rtls cur , p vio cur , p w prv , t rtls , t vio which denote the current RTLS position output, current VIO position output, previous fusion estimated world position output, RTLS output time interval, and VIO output time interval, respectively.This filter outputs rtls_good , which indicates the validity of the RTLS position output, in order to discard RTLS output when rtls good = 0 .In this filter, considering the asynchronous arrival time of RLTS and VIO values, they are compared in the form of velocity, denoted as v rtls , v vio , where v rtls represents the position difference of previous fusion estimated state and RTLS, and v vio represents the position difference of previous fusion estimated state and VIO, respectively.Because VIO can provide an accurate position in a short amount of time, we reject the RTLS measurements if the difference between the v rtls and the v vio is greater than the threshold as described in the Eq. ( 4).The tolerable difference limit of RTLS and VIO is the sum of the RTLS www.nature.com/scientificreports/error part ( v rtls_offset ), related to its precision, and VIO error part ( v vio_tol ) associated with its drift accumulation characteristic.These values are recalculated at every filter execution time.
For the ultrasonic RTLS error ( v rtls_offset ), we calculate it by dividing the precision of RTLS, which is 2 cm according to the Marvelmind beacon specification 10 , by the time span.VIO error ( v vio_tol ) is calculated by mul- tiplying accumulated VIO velocity by a parameter ( drift_scale ), accounting for accumulating characteristics of VIO drift.If no compensation on VIO estimation by RTLS occurs, we use accumulated VIO translation ( p vio accum ) to calculate VIO error.VIO translation is reset to 0 if compensation occurs.The outlier filter starts running after an initial duration from take-off because VIO is prone to high errors by restricted camera FOV when the drone is closed to the ground.

Graph optimization fusion
Graph-based optimization is adopted as the fusion algorithm of VIO and RTLS for localization in a The graph is composed as shown in Fig. 7 and its nodes and edges are updated with VIO pose estimation and outlier filtered ultrasonic RTLS position.When a new RTLS position is inserted in the graph, an optimization process is conducted.In the graph, r i vio and r i rtls means the VIO residual and the RTLS residual at frame i , respectively.If EKF (extended Kalman filter) 18 is utilized to fuse RTLS and VIO, in the absence of RTLS, the filter continues to execute using VIO only, resulting in high error or unstable output in EKF estimation.On the other hand, the graph optimization method can stop executing optimization when RTLS is unavailable, and restart optimization when RTLS is available again.It utilized the latest graph, including VIO edges that were inserted during RTLS absence.
In this research, to achieve real-time performance in optimization calculation, the total number of nodes is constrained up to 500 using window management, which discards old nodes and edges.Because the period of RTLS localization output (8-9 Hz) is slower than VIO (15 Hz), graph optimization is conducted in the background when new RTLS data updates the graph.
For the graph optimization algorithm, VINS-Fusion graph optimization 14 is used as the backbone algorithm.State set, which represents the graph nodes, is expressed as χ = {X 0 , X 1 , X 2 , ..., X n } where i-th state X i consists of global coordinate position p i w and orientation q i w .The graph comprising state set χ and measurement set from RTLS and VIO is interpreted into a non-linear least square problem to estimate the states as below: (4)  where r(χ) � is Mahalanobis norm distance of residual r(χ) , and S1 and S2 are index sets of VIO and ultra- sonic RTLS data, respectively.is the covariance matrix for VIO or RTLS output.We used a unified diagonal covariance matrix with a value of 0.05 because the Marvelmind system's RTLS precision is known to be a few decimeters 10 .The VIO covariance matrix is also set to a diagonal matrix with values of 0.1 and 0.01 for position entries and attitude entries, respectively.The residual of VIO is where p vio i and q vio i denote the VIO estimated position and rotation in the VIO local coordinate, while p w i and q w i denote graph optimized position and rotation in the global coordinate, respectively.⊖ operation calculates pose difference.The VIO position residual takes the position in the displacement form, i.e., p vio i − p vio i−1 , rather than position form, because the VIO position accumulates drift.The RTLS residual is where p rtls i is the RTLS position measurement in global coordinate.This nonlinear least square problem is solved using Ceres solver 19 utilizing the Gaussian-Newton and Levenberg-Marquadt 20 iterative method.

Experiment Test equipment
A drone for evaluation of localization performance is shown in Fig. 8a and its onboard computers and sensors are listed in Table 3. Figure 8b illustrates the hardware and software architecture of the test drone.The Marvelmind (5)

Conclusion
In this paper, a fusion localization system of VIO and ultrasonic RTLS for airplane drone inspection is proposed.In configuring the VIO system, we determined that an optical flow algorithm utilizing stereo images with view direction turned away from an airplane is the most suitable choice for a hangar environment.To utilize ultrasonic RTLS which has inherent drawbacks in non-LOS situations, an outlier filter is suggested.Then, VIO and outlier-filtered ultrasonic RTLS localization are fused through graph optimization.Experiment results in a hangar demonstrate that the proposed system effectively compensates for accumulated drift error in VIO in case of stable RTLS conditions.Also, the test result shows that in case of unstable RTLS conditions, abnormal RTLS data is rejected and localization can continue using VIO without RTLS.We implemented image buffer and graph marginalization in VIO and graph optimization for real-time capability on the NVIDIA Xavier NX target board.Using these methods, we successfully accomplished an automated drone flight.
As future research for more stable fusion localization, we plan to develop a system that can change into RTLS fused with IMU mode in situations of VIO divergence or tracking loss.Furthermore, we are considering the use of a more powerful single-board computer to enhance calculation speed and accuracy.

Figure 3 .
Figure 3. Visual odometry and result trajectory of Aircraft view (a,b) and hangar view (c,d) by ORB-SLAM2 11 algorithm without loop closure.

Figure 4 .
Figure 4. RGB-D visual odometry experiment: manual test flight (a), 3D reconstruction by RTAB-MAP 12 algorithm (b), feature tracking on the fuselage by ORB-SLAM2 algorithm (c), RGB-D data on the fuselage (d), tracking lost on airplane tail by ORB-SLAM2 algorithm (e), and RGB-D data on tail (f).

( 3 )Figure 6 .
Figure 6.Ultrasonic RTLS test configuration in a hangar (a) and its position output (normal output (b) and irregular output (c)).

Figure 8 .
Figure 8.A drone for the experiment (a) and its hardware and software architecture (b).

Figure 9 .
Figure 9. Proposed algorithm and VIO test result: RTLS improves localization accuracy.

Table 4 .Figure 10 .
Figure 10.Proposed algorithm and VIO test result (a) and RTLS unstable region view (b): RTLS fails but localization continues.

Table 3 .
Onboard computers and sensors.

Table 5 .
Outdoor flight test result.Significant values are in bold.