Research on indoor positioning method based on LoRa-improved fingerprint localization algorithm

Traditional fingerprint localization algorithms need help with low localization accuracy, large data volumes, and device dependence. This paper proposes a LoRa-based improved fingerprint localization algorithm-particle swarm optimization-random forest-fingerprint localization for indoor localization. The first improvement step involves creating a new exceptional fingerprint value (referred to as RSSI-RANGE) by adding the Time of Flight ranging value (referred to as RANGE) to the Received Signal Strength Indication (RSSI) value and weighting them together. The second improvement step involves preprocessing the fingerprint data to eliminate gross errors using Gaussian and median filtering. After noise reduction, the particle swarm optimization algorithm is used to optimize the hyper parameters of the random forest algorithm, and the best RSSI-RANGE value is obtained using the random forest algorithm. The Kriging method is then used for interpolation to establish an offline fingerprint database, and the final online recognition and localization are performed. Experimental results demonstrate that the first improvement step improves localization accuracy by 53–57% in different experimental scenarios, while the second improves localization accuracy by 25–31%. When both steps are combined, the localization accuracy is improved by 58–63%. The effectiveness of this method is demonstrated through experiments.

1.The cost of indoor positioning using traditional fingerprint algorithms is reduced by utilizing the characteristics of LoRa.The proposed method improves the positioning accuracy and reduces the data volume by using the RSSI-RANGE value and Kriging interpolation.The experimental results demonstrate that the PSO-RF-FPL algorithm applies to various environments and effectively improves the positioning accuracy of traditional fingerprint algorithms.2. The algorithm's validity is demonstrated by using 3-4 gateway devices.This study provides a theoretical basis for further increasing the number of gateway devices to improve positioning accuracy and apply the algorithm to large-scale scenarios.
The method proposed in this method utilizes the characteristics of LoRa to reduce the cost of indoor positioning of traditional fingerprint algorithms.At the same time, RSSI-RANGE values and Kriging methods are used to improve positioning accuracy and reduce data volume.

Related work
Indoor positioning technology relies on both positioning techniques and algorithms.Currently, there are two main positioning techniques: sensor-based and signal-based.Sensor-based techniques include sound sensors 4 , optical sensors 5 , and infrared sensors 6 , but they require the installation of numerous sensors and have low accuracy 7 .In contrast, signal-based techniques have become the focus of research due to their advantages in ease of installation and high accuracy 8 .Therefore, various wireless signals, such as Wi-Fi, ZigBee, RFID, BLE, UWB, LoRa, Sigfox, and NFC, are widely used in indoor positioning research.
There are several algorithms for indoor positioning, including proximity algorithms, triangulation algorithms, multipoint positioning algorithms, maximum likelihood algorithms, and fingerprint positioning algorithms.Fingerprint positioning algorithms are widely used in large-scale environments such as shopping centres, markets, and campus buildings because of their characteristics.Alhomayani et al. 9 discussed the advantages and disadvantages of various fingerprint types used in indoor positioning and looked at future research directions.Cui et al. 10 proposed a method to process fingerprint data based on skewness-kurtosis normality test and Kalman filter fusion, and the positioning accuracy was improved by 60% compared with the traditional Kalman filter method.Lian et al. 11 proposed a KPCA-ELM joint positioning algorithm that uses the nonlinear properties of KPCA (kernel principal component analysis) to allow for the replacement and dimensionality reduction of original RSS data and construct new features.The article demonstrates that the algorithm can effectively reduce the impact of noise on RSSI values and improve accuracy by comparing different WIFI AP signal points.Marwan Alfakih et al. 12 proposed a new fingerprint probability algorithm that can improve the accuracy of device positioning using Wi-Fi signal strength in indoor environments, with accuracy improvements ranging from 5.1 to 21.5% in different environments.However, this method is limited due to the relatively short distance of Wi-Fi signals in practical applications.Jait Purohit et al. proposed an interpolation-assisted fingerprint positioning system architecture 13 .They addressed the issue of large LoRa network size and wide coverage range by proposing a deep autoencoder method that effectively solves the problem of missing/outlier samples in the LoRa network.Suroso et al. applied interpolation techniques to reduce the time and effort required to collect fingerprint data.They compared the classic pattern-matching algorithm and the minimum Euclidean distance algorithm.The latter performed better in accuracy and precision, while the random forest algorithm performed better in reducing the maximum estimation error 14 .
In the study, factors such as the effectiveness of RSSI data processing and the stability of LoRa indoor propagation will affect the absolute positioning accuracy, so it is necessary to comprehensively analyze these factors, use appropriate hardware devices and antennas, and develop more effective RSSI data processing strategies to improve the accuracy of LoRa-based indoor positioning [15][16][17][18] .In related research fields, various methods have been proposed to solve similar problems, such as Kalman filters, moving average filters, multipath fading modelling, and calibration methods [19][20][21][22][23][24][25] .

Proposed method
The fingerprinting localization (FPL) algorithm involves placing multiple wireless signal sources in a specific area.The surrounding environment affects the wireless signals emitted, forming a one-to-one correspondence between the received signal strength indication (RSSI) and the location.
FPL is based on Bayesian theory to estimate the probability of the terminal node being located in various areas.The deployment area is defined as X 1 ∼ X n , where X i ∈ X and X i is defined as the coordinate x i , y i .The v e c t o r s r e p r e s e n t s t h e R S S I -R A N G E v a l u e s s a m p l e d b y n g a t e w a y s , s = {(RSSI − RANGE 1 ), (RSSI − RANGE 2 ), . . ., (RSSI − RANGE n )} .RSSI − RANGE n denotes the RSSI- RANGE value sampled by the nth gateway.Each location X i has a corresponding vector s.During the online phase, the n gateways' RSSI-RANGE values form the vector s ′ , and the goal is to find the area with the highest probability, that is, to determine argmaxp X i |s ′ .According to Bayes' formula, the posterior probability formula can be obtained.
In the formula, p(X i ) is the prior distribution of the location, p s ′ is the signal strength distribution, independent of the location.Since a uniform prior distribution is set here, the denominator p s ′ and prior distribution p(X i ) can be ignored in Bayes' formula.Therefore, it is necessary to compare p s ′ |X i .The maximum likeli- hood estimation method is used to estimate the location of the terminal device.The likelihood function (2) is p s ′ |X i , and the location estimated by formula ( 3) is where the likelihood function is maximized.
The likelihood function in Eq. ( 2) can ignore the denominator using maximum likelihood estimation.F s ′ , X i in Eq. ( 4) is obtained by the sum of probabilities from all gateways.
The conditional probability p s ′ |X i is the probability that the gateway gw j measures the RSSI-RANGE value at location X i which is obtained by comparing s ′ with the data ins.
This method combines the Bayesian and trilateration algorithms to achieve more accurate localization results.The initial position is calculated using the trilateration algorithm, and then the Bayesian algorithm is used to refine the initial position.In the Bayesian algorithm, the prior and posterior probability is combined with the weighted distance calculated in the trilateration algorithm to obtain a more accurate position estimate.The following are the steps for combining the weighted distances: 1. Use the trilateration algorithm to calculate the initial position, assuming the calculated target position is (x 0, y 0 ). 2. Use the Bayesian algorithm to refine the initial position.The Bayesian algorithm is a probability-based localization method using prior and posterior probability to calculate the target position.In indoor localization, signal strength and gateway location information can be used to calculate the prior and posterior probability.Assume that the target position calculated by the Bayesian algorithm is (x 1 , y 1 ).3. Weighted formulas can be based on either errors or signal strength.This method selects a weighted formula based on errors for combining the trilateration and Bayesian algorithms.
Assuming that the target position calculated by the i-th algorithm is (x i , y i )and the proper position is (x t , y t ), the distance between them can be represented as The reciprocal of d i is used as the weight, i.e., w i = 1 d i .The final weighted average position can be calculated using the following formula: Compute the weighted average of the positions obtained by each algorithm according to their weights, and obtain the final position coordinates ( x f , y f ).
Four steps are needed to perform indoor positioning using the proposed PSO-RF-FPL method: collecting RSSI and TOF measurements, data preprocessing, building an offline fingerprint database, and online positioning.The workflow diagram is shown in Fig. 2.
Data collection and preprocessing.In indoor wireless signal propagation, the RSSI values collected from the same anchor at the exact location can vary continuously over time due to the fluctuation of wireless Table 1 summarizes the standard deviations of RSSI and RANGE values (without filtering) obtained at specific testing points in an indoor environment.Results from multiple field tests indicate that the standard deviation of positioning error is more minor when using TOF range values for indoor positioning than RSSI values.Assuming that the distributions of RSSI and RANGE values follow a Gaussian-like distribution, we can incorporate the RANGE value into the fingerprint feature vector to compensate for the instability of using a single RSSI value as the fingerprint feature vector.This method can improve the accuracy and stability of positioning.Considering that the measurement error of RSSI values is more significant than that of RANGE values, a weighted scheme is adopted in this method to increase the positioning accuracy of the fusion fingerprint using both RSSI and RANGE values.
Assuming the RANGE value error follows an average distribution N(δ i , σ 2 T ), and the RSSI distance measurement error follows a normal distribution N(δ R , σ 2 R ), the actual distance from the unknown node A to the gateway node Bi can be calculated using the distance formula.
Since the measured distances from the unknown node to the gateway nodes Bi are di, i = 1, 2, 3, 4, the measurement error between the measured and actual distances can be represented as Therefore, the objective function for optimization can be formulated by considering the weighted sum of the errors, where the weighting factors for the RANGE and RSSI values are α and β, respectively.The weighted error sum of squares is given by: Hence, the problem of finding the approximate coordinates of the unknown node A can be transformed into a nonlinear optimization problem.
The weight ratio γ = α β in the objective function represents the relative importance of the two distance measurement errors.Three test points were selected, and the experimental results are shown in Fig. 5.When γ = 1, i.e., when the RANGE and RSSI distance measurement errors are given equal weights, the positioning error of the unknown node is the largest.As the weight ratio γ increases, the positioning error decreases, giving higher weight to the more accurate RANGE values.However, since both distance measurement techniques have errors, it is impossible to eliminate the positioning error.When the weight ratio γ increases from 1 to 15, the improvement in positioning error is significant; when the weight ratio γ increases from 15 to 29, the improvement in positioning error is slight.Once the weight ratio γ reaches a specific value, the positioning error will stabilize.In this method, γ = 21 is selected.Equation (10) represents a weighted formula.
In the given equation, RSSI − RANG i represents the weighted value formed, where α and β are the weight- ings for the RANGE value and the RSSI value, respectively.RANG i and RSSI i denote the RANGE value and the RSSI value, respectively.
In indoor localization scenarios, measurement results are affected by random additive Gaussian noise and environmental changes, which may cause measurement values to deviate from the standard bell curve.( 6) Nevertheless, the actual measurement values still exhibit the characteristics of Gaussian normal distribution.Due to environmental changes during the measurement process, some values may deviate far from the mean and be affected by gross errors.Therefore, in order to eliminate the influence of these measurements on the performance of the fingerprint localization model, this method adopts the Gaussian filtering method to filter the measurement values, filtering out the measurement values that deviate far from the mean, thereby improving the accuracy and robustness of the fingerprinting localization model.For each reference point, the mean value µ l (mean value of each column) and standard deviation σ l (standard deviation of each column) of the fingerprint matrix R i collected is calculated for each dimension l = 1, 2, …, 2M.Assuming that each column of the training set data collected follows a normal distribution, its probability density function is denoted by f(x).Thus, f(x) can be expressed as: The mean μ of a normal distribution describes the central location of the distribution.Gaussian filtering can leverage the normal distribution's characteristics to improve signal quality and construct more stable and effective fingerprint databases.The Gaussian filter eliminates noise by averaging the pixel values around each pixel, and the weight used in calculating the weighted average is a Gaussian function.This method smooths out signal values far from the mean, thereby improving the stability and reliability of the signal.
A threshold probability θ is set to represent the probability that signal values are distributed in the interval [µ l − l , µ l + l ] , where l can be determined using the training set data and θ.Specifically, according to Eq. (11), assuming that each column of data in the training set follows a normal distribution with probability density function f (x), l satisfies: After obtaining l from Eq. ( 12), the reasonable range of each element in column l of R i is Signal values within this range are considered reliable, while those outside are unreliable.Finally, for each element Asθ approaches 1, and the original signal values are not filtered, leading to more significant localization errors.However, when θ approaches 0, the filtering of the samples is too strict, which may result in the loss of many valid signal values and the loss of the characteristics of the fingerprint point.In this method, θ is set to 0.6.Figure 6 shows the variation of the average localization error of this algorithm with the threshold probability θ.
During the data collection process, sampling points were randomly selected and data was collected over varying time periods.This approach aimed to ensure a more comprehensive and realistic representation of the collected data.Figure 7 shows that during the offline fingerprint acquisition phase, some test point positioning errors are affected by Gaussian filtering.The solid line represents the localization accuracy obtained by directly storing the fingerprint data in the database without Gaussian filtering.In contrast, the dashed line represents the localization accuracy obtained after applying Gaussian filtering and storing the data in the database.The figure shows that, except for one test point, the localization accuracy after Gaussian filtering is generally better than that without Gaussian filtering for most of the 10 test points.This indicates that using Gaussian filtering for preprocessing the signal values in each dimension during the offline fingerprint collection phase can significantly improve localization accuracy.
Figure 8 shows the RSSI values between test point e and four gateways after Gaussian filtering.
(11) In experiments 2 and 3, obtaining the RSSI-RANGE values of the entire deployment area was not feasible.Measuring the RSSI-RANGE values throughout the entire deployment area was costly and unnecessary.This method utilizes the Kriging method for interpolation.Kriging is a statistical method used for spatial   where Z(s i ) represents the measured value at the i-th location, i denotes the Kriging weight associated with the measurement at the i-th location, s 0 represents the prediction location, and N is the number of measured values.The calculation of Kriging weights relies on the model of the semivariogram function.Due to the influence of buildings, walls, obstacles, and other factors on LoRa signal propagation, the propagation model typically exhibits nonlinear attenuation and multipath effects.Therefore, in this research, the selected semivariogram function model is the Gaussian model.
where y(h) represents the value of the semivariogram function, h denotes the distance between known points, and C 0 , C 1 , and r are parameters of the model.C 0 represents the baseline value, C 1 represents the magnitude of signal strength variation, and r represents the scale parameter of correlation.
This method utilizes the Kriging library in Python to train the Kriging model using a fingerprint database.The centre coordinates of the grids without fingerprint points were selected as the interpolation coordinates.Table 2 shows some of the generated RSSI interpolations in Experiment 2.

Construction of offline model.
Before constructing the offline model, the obtained data were processed using the PSO-RF algorithm to obtain the optimal fingerprint values.
Random Forest (RF) algorithm 26 is a very flexible method that can automatically handle missing data, nonlinear relationships, and combinations of multiple features.The RF model is shown in Fig. 9.
The number of trees and the number of feature selections in the RF algorithm have an essential impact on the performance and complexity of the model.However, they usually require manual adjustment, a tedious task requiring a lot of trial and error.By using the PSO algorithm to optimize the hyperparameters of the RF algorithm, the performance and generalization ability of the model can be improved while reducing the time and energy cost of manual tuning.
Particle Swarm Optimization (PSO) is an optimization algorithm based on swarm intelligence 27 .This algorithm considers the optimisation problem to find the optimal global solution in a multidimensional space, and each solution is considered a particle in space.
These particles search for the optimal solution by continuously adjusting their position and velocity.In the search process, Eqs. ( 15) and ( 16) are used to update the position and velocity of each particle.In order to improve the convergence speed of the algorithm, a weight factor w is introduced.The particle representation is called velocity, but the distance and direction the particle will move in the next iteration is a position vector.
In the equation, k represents the number of iterations, ω represents the inertia weight, c 1 represents the indi- vidual learning factor, c 2 represents the group learning factor, and r 1 r 2 is random numbers in the interval [0, 1] to increase the randomness of the search.v k id represents the velocity vector of particle i in the d-th dimension in the k-th iteration, x k id represents the position vector of particle i in the d-th dimension in the k-th iteration, p k id,pbest represents the historical best position of particle i in the dth dimension in the k-th iteration, i.e., the best solution found by the i-th particle (individual) after the k-th iteration, and p k id,gbest represents the historical best www.nature.com/scientificreports/position of the group in the dth dimension in the k-th iteration, i.e.,the best solution found by the entire particle swarm after the k-th iteration.
The PSO-RF algorithm randomly initializes a swarm of particles for the hyperparameters of the decision tree, such as the number of trees (n_estimators) and the maximum depth (max-depth).It then calculates the corresponding fitness values and continuously updates the particles' velocities and positions to achieve the best fitness value, thereby obtaining the optimal hyperparameters for the RF model, n_estimators and max_depth, which in turn improves the convergence speed and prediction performance of the RF model.
Figure 10 compares the single RSSI fingerprint, single RANGE fingerprint, RSSI-RANGE fingerprint, and actual distance value of Gateway 1 collected in Experiment 2 after being processed by the algorithm.
Online real-time positioning phase.During actual positioning, LoRa devices are used to collect realtime RSSI-RANGE features at a specific location for t times.The data is preprocessed using a median average filtering method.The median averaging filter is a signal processing technique used to reduce noise in a signal.It smooths the signal by calculating the median value of the data points within a window.The main advantage of the median averaging filter is its effectiveness in reducing noise, especially in the presence of outliers or impulse noise.Compared to other averaging filter methods, the median averaging filter performs better in preserving signal edges and details because it is not affected by outliers.The maximum and minimum values of the t RSSI-RANGE features at the exact location are removed, and the remaining data is averaged to obtain a unique fingerprint feature.This fingerprint feature is then input into the offline training model obtained in section "Construction of offline model" to calculate the coordinates of the target node.
At a particular location for t times, LoRa devices continuously collect fingerprint feature vectors, resulting in t records denoted as: For R t l , when l = 2j-1 (odd columns), it represents the RSSI fingerprint value from the j-th gateway device obtained in the tth collection; when l = 2j (even columns), it represents the range fingerprint value obtained based on the ranging engine mode from the j-th gateway device in the tth collection.Here, l = 1, 2, …, 8, and j = 1, 2, …, 4. t = 1, 2, …, 10.For each column, the maximum and minimum values are removed, and the average value of the remaining data is denoted as R ′ l .The average value vector of W should be: Enter the offline model.The obtained average vector of W is then used as input to the offline training model constructed in section "Construction of offline model".The model then calculates the two-dimensional coordinates (x, y) of the location to be determined, which are taken as the final position coordinates.In this way, the actual position of the test point can be estimated.The central position coordinates of the i-th grid point area are denoted as P i = (x i , y i ), and the fingerprint matrix R i collected in the i-th grid point area is denoted as follows: For R k i.l , when l = 2j − 1 (odd column), it Indicates RSSI fingerprint value of the received signal strength obtained from the j-th gateway device based on the communication mode acquired for the k-th time at the i-th fingerprint grid point area.When l = 2j (even column), it represents the RANGE value obtained from the j-th anchor device based on the ranging engine mode in the k-th collection at the i-th fingerprint grid point region, where i = 1, 2, …, 55, j = 1, 2, 3, l = 1, 2, …, 6, and k = 1, 2, …, 200.
As shown in Fig. 12a-c  Narrow and elongated indoor corridor environment.The experimental environment for the narrow indoor corridor is shown in Fig. 13, which is 72 m long and 2 m wide, with an area of approximately 144 m 2 .A total of 48 fingerprint points and 41 test points were selected in the entire experimental area.Four gateways were deployed from left to right, numbered 1 to 4.
Similar to the previous experiment, LoRa gateways were deployed at fixed positions (15, 1), (30, 1.5), (45, 0.5), and (60, 1), and the remaining grids were interpolated using the Kriging model.The fingerprint collection method was the same as described in section "Rectangular indoor environment".
Cross-room indoor environment.The experimental environment and floor plan of the indoor environment with multiple rooms is shown in Fig. 14, with a length of 24 m and a width of 14 m.A total of 83 fingerprint points and 72 test points were selected in the entire experimental area.Four gateways were deployed, with Gateway One at the bottom left, Gateway Two at the bottom right, Gateway Three at the top right, and Gateway Four at the top left.
Experimental results.The experimental results for the three scenarios are shown in the Table 3 below.
Based on experiment 2, it can be observed that in experiment (1), (2), and (3) comparisons, the third case, which uses both RSSI and RANGE values for fingerprinting, has an average positioning error of 0.82 m, the SSD is 1.21 m.In the second case, where only RSSI values are used, the average positioning error is 1.87 m, the SSD is 1.32 m. while in the first case, where only RANGE values are used, the average positioning error is 0.94 m, the SSD is 1.19 m. Figure 15 shows the error comparison of 30 testing points using different fingerprinting values.
In experiments (3), ( 4), and (6) comparisons, without using the RF algorithm to optimize the data, the average positioning error is 1.17 m, the SSD is 1.54 m.After processing the data using the RF algorithm optimized by the PSO algorithm, the average positioning error is the lowest at 0.82 m, the SSD is 1.21 m. Figure 16 shows the error comparison of 30 testing points using different fingerprinting values.It can be seen by comparing experiments 1, 2, and 3.It can be seen that the presence of many electronic devices and obstacles, such as walls in the indoor environment, leads to a slight decrease in positioning accuracy.However, the proposed method in this method is still effective in improving positioning accuracy.Figure 17 shows the overall error comparison of experiments 1, 2, and 3.

Conclusion
In this method, we address the weakness of traditional Fingerprint-based Positioning Systems (FPL), which is the susceptibility of the Received Signal Strength Indicator (RSSI) to interference that leads to poor model quality during offline training.We propose two improvements: (1) the addition of Time of Flight (TOF) ranging fingerprint to improve fingerprint stability by combining new weighted fingerprint values, and (2) Gaussian and median filtering for fingerprint preprocessing during fingerprint database creation online positioning, respectively.We eliminate gross fingerprint errors and use the Kriging method for interpolation, followed by the PSO-RF-FPL algorithm to build an optimal offline fingerprint database.We demonstrate the effectiveness and reliability of our proposed approach through on-site indoor experiments.
The results show that adding TOF ranging values in the three experimental scenarios increased the positioning accuracy by 53-57% while using the PSO-RF-FPL algorithm improved the positioning accuracy by

Figure 3
shows 200 RSSI values collected using LoRa devices at several test points, with the antenna facing vertically upward.In indoor positioning applications, wireless signals can be interfered with by various factors during propagation, such as multipath effects, signal attenuation, and damping, resulting in complex time-varying characteristics of RSSI values at the exact location.Traditional fingerprint positioning algorithms typically use RSSI values as the feature vector of fingerprint signals.However, due to the instability of RSSI values, the model is prone to overfitting during offline training, leading to poor generalization performance and significant prediction errors in real-time positioning.Therefore, this method proposes to use both RSSI and TOF ranging values to form an exceptional fingerprint value.Figure4shows 200 measurements of ranging values obtained at several test points.

Figure 3 .
Figure 3. RSSI values at specific test points.

Figure 6 .
Figure 6.The average positioning error varies with threshold probability θ.

Figure 7 .
Figure 7. Gaussian filter processing data comparison chart.

Figure 8 .
Figure 8.The RSSI value of test point e.
represent the RSSI values collected by three gateways at test point 1, while d, e, and f represent the RANGE values collected by three gateways at test point 1.

Figure 15 .
Figure 15.Error comparison of different fingerprint values.

Figure 16 .
Figure 16.Comparison of different algorithm errors.

Figure 17 .
Figure 17.Comparison of the overall error of the experiment.

Table 1 .
The standard deviation of RSSI and TOF ranging values at each test point.

Table 3 .
Average positioning error for six cases.