Abstract
For multi-dimensional high-order nonlinear systems with unstable path quality in parameter and extension terms, we developed a new fast search random tree strategy. First, we established a high-order Lipschitz vector field dynamic system to adapt to high-order systems of multi-degree-of-freedom robots, with the complex obstacle function being one of its key components. Secondly, we designed a classification gap filtering network layer (Classification LSTM) to screen training data models and ensure the global stability of data in path design. Additionally, the visual sensors deployed in the unit area effectively implement the path marking backtracking strategy and dead zone path simplification. Finally, three examples are provided to verify the effectiveness of this design method.
Similar content being viewed by others
Introduction
In recent years, interest in path planning design has increased due to the unstable path planning quality of smart devices in complex environments1,2,3. Rapidly-exploring random tree (RRT)4,5,6 is an attractive method for planning design because it can efficiently search the planning space and generate approximately optimal paths, making it suitable for application scenarios with high real-time requirements, such as dynamic environments7,8.
The first paper to use the RRT algorithm in path planning appears to be9. Since then, there has been significant interest in improving RRT algorithms to optimize path planning quality. Lan and Jin10 proposed an improved RRT algorithm based on environmental ocean currents, which can adapt to the complex environment of real ocean currents and reduce the length of planned paths. An and Kim11 studied an adaptive step-size RRT path planning algorithm using the maximum Cartesian displacement constraint. The advantage of this algorithm is that it uses a minimum obstacle size parameter and an operator-induced matrix to determine the maximum allowable step size, offering practical benefits over fixed step-size RRT planning algorithms. An interesting application12 is to use this constraint for parallel trajectory planning in dynamic environments.
In13, a velocity planning method with minimum jitter was proposed to improve the smoothness and stability of mobile robots during path tracking. However, due to the presence of unknown parameters in the observed data, stability is limited. Zhou and Pan14 proposed a perception-adaptive trajectory replanning method called Raptor, which dynamically adjusts the flight trajectory based on environmental perception to cope with sudden obstacles and environmental changes. Nilles and Ren15 designed a non-deterministic rebound strategy based on visibility calculation, analyzing the relationship between the robot and the visible area in the environment to design an effective path planning strategy in dynamic and unpredictable environments.
It should be noted that the path planning results of the above-mentioned RRT algorithms and other extended technologies are focused on general linear16 or nonlinear systems. However, in multi-degree-of-freedom robot systems, there are often high-order nonlinear terms17. The parameters and extensions of such high-order nonlinear terms lead to unstable path planning quality. Therefore, it is imperative to study an improved RRT method based on multi-dimensional high-order nonlinear systems.
Based on these observations, we developed an improved RRT behavior planning method for robots using the PTM algorithm. The PTM algorithm is an optimization algorithm that includes a high-order Lipschitz vector field dynamic system, Classification LSTM, and a limited area sensor deployment design. The design model of the method in this study is shown in Fig. 1.
Figure 1 shows a comprehensive path planning and optimization system involving three main modules: the nonlinear system, the RRT (rapidly-exploring random tree) algorithm, and PTM (path tracking management). The nonlinear system is processed by Markov and complex obstacle functions and optimized using a gap filtering network layer. The RRT module performs path searches through random sampling and multidimensional systems and interacts with the PTM module. The PTM module includes sensor deployment, path marking, and dead zone simplification, and performs path optimization and management through backtracking and matrix generation. The overall system provides a high-dimensional and multi-order path planning method through these interconnected submodules. The contributions of this article are threefold:
-
(1)
For general linear16 or nonlinear systems in static environments, in addition to existing results18, we also consider a high-order Lipschitz vector field model. It establishes dynamic systems based on Markov nonlinear analysis. Compared with the fixed step size of11, it has the planning flexibility within the safety limit curve. We also optimize the high-order Lipschitz vector field model based on the kinematics of the Differential Drive Robot (DDR), and simulate the planning process through three-dimensional scene experiments to make it more suitable for real mobile robots.
-
(2)
A Classification LSTM is designed. Although our design is driven by12, due to the existence of nonlinear parameter arbitrary terms and extension terms in multi-dimensional high-order path planning, the path planning quality is unstable. To overcome this difficulty, we propose a Classification LSTM.
-
(3)
A deployment method of visual sensors19 in a limited area is established. This method can efficiently implement two path optimization strategies: path marking backtracking strategy and dead zone path simplification. Compared with9,10,11,12,13,14,15, it has better extended nodes and planning cycle.
The remainder of this article is organized as follows. “High-order Lipschitz vector field dynamic system” Section establishes a high-order Lipschitz vector field dynamic system. “Classification LSTM” Section designs a dynamic system based on a Classification LSTM. “Deployment method of visual sensors in a limited area” Section discusses the establishment process of the method for deploying visual sensors in a limited area. In addition, the design principle of the improved RRT is given. “Experiment analysis” Section provides three examples to illustrate the theoretical results. “Conclusion” Section concludes this article and points out the research limitations and future work directions.
High-order Lipschitz vector field dynamic system
The high-order Lipschitz vector field dynamic system is a mathematical model used to describe and control multidimensional high-order nonlinear systems. It uses the Lipschitz condition to ensure the stability of the system in a dynamic environment. Compared with the traditional Lipschitz vector field20, it can more accurately handle the complex dynamic behavior of multidimensional high-order nonlinear systems.
Higher-order Lipschitz vector field theory based on Markov model
The Markov model21 is a statistical decision model. It is introduced into the high-order Lipschitz vector field theory to handle uncertainty and dynamic environments in path planning. This is because most actual planning scenarios are dynamically changing, including the movement of equipment, people, and other unpredictable factors. The Markov model can effectively predict and respond to changes in the environment by describing the transition probability of the system state, thereby improving the accuracy of path planning.
Assume that \(\left\{ {X\left( t \right), t \in T} \right\}\) is the random process of system operation. If for any positive integer n and \(t_{1} < \cdots < t_{n}\), \(P\left( {X\left( {t_{1} } \right) = x_{1} ,X\left( {t_{2} } \right) = x_{2} , \ldots ,X\left( {t_{n - 1} } \right) = x_{n - 1} } \right) > 0\), the conditional distribution satisfies:
Then we call \(\left\{ {X\left( t \right),t \in T} \right\}\) a Markov process.
In a nonlinear mechanism, different operating parameters of the robot will produce different operating states. These different operating states will generate different original signals, which then stimulate the target to enter the next cycle of judgment. Therefore, a state storage matrix is introduced to store these different operating states.
where \(PR\) is the state storage matrix. \(P_{xy}\) represents the state parameter probability.
Establish a nonlinear safety system:
where \(x\) is the parameter state of the nonlinear mechanism. \(f\left( x \right)\) is a local Lipschitz continuous vector field.
For a local Lipschitz continuous vector field, a continuous state function \(g\left( x \right)\) is given, and a restriction condition is added, namely the constant \(K \ge 0\). So that in the state set area of the function \(g\left( x \right)\), any two different parameters satisfy
We call this local Lipschitz optimal. Among them, the weight is \(w_{i}\). The bias is \(b_{i}\). The activation function is \(h\left( x \right)\). The system stability constant is \(C\). For any two parameters \(x_{a}\) and \(x_{b}\), there exists:
When and only when the vector derivative of the i-th layer is greater than 1, it is the most applicable moment for the activation function, at which time the rotation degree of the microvector field turns to the local Lipschitz vector field. That is, a Lipschitz vector field belonging to \({\mathbb{R}}^{n}\) is formed.
Consider the multi-state changes of state parameters. Therefore, a multi-dimensional high-order nonlinear safety system is established based on the above nonlinear basic expression:
where \(x\) is the real-time parameter state. \(y\) is the predicted parameter state. \(n\) is the high-dimensional state. \(f\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)\) is a multi-dimensional local Lipschitz continuous vector field.
Establish a safety set for a multi-dimensional high-order nonlinear safety system:
where \(S\) is the safety set. \(x\) and \(y\) are parameter states. \({\mathbb{R}}^{n}\) is the n-th dimension based on the continuous Lipschitz vector field. \(g\left( x \right)\) is the state function. \(h\left( x \right)\) is the activation function.
Introduce the safety boundary curve. It will establish different coordinate base points as the real-time position of the robot changes. The boundary point is close to the critical point of the safety set.
where \(\ell\) represents the safety boundary curve. \(R\) represents the real-time safety radius. \(x_{t}\) and \(y_{t}\) are the predicted coordinates of the robot in the next cycle. \(x_{robot}\) and \(y_{robot}\) are the real-time coordinates of the robot. \(k_{t}\) is the equilibrium error constant. It is positive when the predicted position of the robot is close to the target position and negative when it deviates from the target position.
Remark1
The safety limit curve is a critical function in a multidimensional high-order nonlinear system. It has the following properties.
-
(1)
When the state parameter is within the critical function, the system is in an absolutely stable state.
-
(2)
When the state parameter falls between the critical function and the safety set, the system is in a relatively stable state.
-
(3)
When the state parameter falls outside the safety set, the system is in an unstable state.
When the safety set is in normal conditions, the parameter probability of the Markov transition matrix changes with time. Therefore, we establish a high-order Lipschitz vector field dynamic system based on the Markov model. This system can switch critical functions and maintain the stability of high-order nonlinear systems.
where \(x = \left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)^{T} \epsilon R^{n}\). The physical meaning of each component represents a different action state. \(f\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)\) is a multidimensional local Lipschitz continuous vector field. \({\text{y}}\) is the output of the system. \({\text{m}}\) represents the system observation coefficient. When \(p > 1\) is an odd number, C is the stability constant of Lipschitz. \(f\left( {x_{n - 1} } \right)\) is a fixed value when the Lipschitz rotation is \(\frac{\pi }{2}\).
For \(x = \left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)^{T}\), set \(x_{1}\) to be the initial state of the robot. The time interval is \(0 \to t_{1}\). The terminal time of the time interval is the real time of state \(x_{1}\). Because the complexity of each exploration cycle is different, there are three different situations: \(t_{1} > t_{2} ,t_{1} < t_{2}\) and \(t_{1} = t_{2}\). According to the principle of time accumulation, \(t_{x1} < t_{x1} < \cdots < t_{xn}\) can be obtained. When the robot does not collide and arrives at the destination safely, the probability of each state must be greater than zero. Otherwise, the state of the system will be interrupted at a certain time node \(t_{xn}\). Therefore, the purpose of the high-order Lipschitz vector field dynamic system we established based on Markov is to make the system close to the ideal state, that is, \(P\left( {X\left( {t_{x1} } \right) = x_{1} , \ldots ,X\left( {t_{xn} } \right) = x_{n} } \right) > 0\). We can partition different time periods for processing. That is, "windowing". Given \(m = 0.2, p = 2, m = 0.6, p = 5\). The system stability constant is \(C = m/p\). \({\mathcal{H}}\) represents the general term for different state parameters. The high-order Lipschitz vector field dynamic system can be expressed as:
The robot we consider is a 6-DOF robot planning test in three-dimensional space. To adapt to a more general DDR, we define the following variables. \(x_{ddr} ,y_{ddr}\) and \(z_{ddr}\) are the current position coordinates of the robot. \(\theta_{ddr}\) is the current direction angle of the robot (the angle relative to the global coordinate system). \(v_{r}\) and \(v_{l}\) are the linear velocities of the right wheel and the left wheel respectively. \(\omega_{rl}\) is the angular velocity. \(q_{ri}\) is the angle of the i-th joint. \(I_{ri}\) is the moment of inertia. \(\tau_{ri}\) is the joint torque. \(C_{rij}\) is the Coriolis torque matrix. \(G_{ri}\) is the gravity term. Then the restriction term in the 6-DOF joint is expressed as \(\dot{q}_{ri} = \frac{1}{{I_{ri} }}\left( {\tau_{ri} - \mathop \sum \limits_{j = 1}^{6} C_{rij} \dot{q}_{rj} - G_{ri} } \right) {\text{for i}} = 1,2,...,6\). \(L\) is the distance between the two wheels. Then the continuous kinematic equation is discretized with a step size of \(\Delta t\). The discretized kinematic equation is
where \(x_{ddr}^{k} ,y_{ddr}^{k} ,z_{ddr}^{k}\) and \(\theta_{ddr}^{k}\) are the states of the robot at the kth step. \(x_{ddr}^{k + 1} ,y_{ddr}^{k + 1} ,z_{ddr}^{k + 1}\) and \(\theta_{ddr}^{k + 1}\) are the states of the robot at the k + 1th step. System (11) is extended to three dimensions. Set the path point \(P_{0} ,P_{1} , \ldots ,P_{n}\). The coordinates of each path point are represented by \(\left( {x_{i} ,y_{i} ,z_{i} } \right)\). \(P_{0} ,P_{1} , \ldots ,P_{n}\) are equivalent to extended nodes in the following text. The smooth curve of the trajectory in path planning can be represented as
where \(b_{i} \left( t \right)\) is the spline basis function. Let \(q = \left[ {q_{r1} ,q_{r2} , \ldots ,q_{r6} } \right]^{T}\). \(\tau = \left[ {\tau_{r1} ,\tau_{r2} , \ldots ,\tau_{r6} } \right]^{T}\). Then the Lagrangian of 6 degrees of freedom gives \({\mathbf{\mathcal{X}}}\left( q \right)\user2{\ddot{q}} + C_{rij} \left( {q,\dot{q}} \right) + G_{ri} \left( q \right) = \tau + h\left( {\dot{q}} \right)\). \(h\left( {\dot{q}} \right)\) is the activation function vector about \(\dot{q}\). Then under the action of the high-order Lipschitz vector field dynamic system, the 6-degree-of-freedom dynamic model can be expressed as \(\mathop \sum \limits_{i = 1}^{6} J_{ri}^{T} \left( q \right)I_{ri} J_{ri} \left( q \right)\user2{\ddot{q}} + \mathop \sum \limits_{{{\varvec{i}} = 1}}^{6} \mathop \sum \limits_{{{\varvec{j}} = 1}}^{6} \left( {\frac{1}{2}\left( {\frac{{\partial {\varvec{m}}_{{{\varvec{rij}}}} }}{{\partial {\varvec{q}}_{{\varvec{k}}} }} + \frac{{\partial {\varvec{m}}_{{{\varvec{rik}}}} }}{{\partial {\varvec{q}}_{{{\varvec{rj}}}} }} - \frac{{\partial {\varvec{m}}_{{{\varvec{rjk}}}} }}{{\partial {\varvec{q}}_{{{\varvec{ri}}}} }}} \right)\dot{\user2{q}}_{{{\varvec{rj}}}} } \right)\dot{q} + \mathop \sum \limits_{i = 1}^{6} J_{ri}^{T} \left( q \right)h\left( {\dot{q}} \right) + F_{c}\). \(J_{ri} \left( q \right)\) is the Jacobian matrix of the i-th joint. \(\frac{{\partial {\varvec{m}}_{{{\varvec{rij}}}} }}{{\partial {\varvec{q}}_{{\varvec{k}}} }}\) is the partial derivative of the inertia matrix element with respect to the joint angle. \(F_{c}\) is the influence of the joint friction force on the joint motion of the right and left wheels. The system state equation of DDR is expressed as
Assume that the system state variable is \({\rm X}_{ddr} = \left[ {x_{ddr}^{k} ,y_{ddr}^{k} ,z_{ddr}^{k} ,\theta_{ddr} ,x_{1} ,x_{2} , \ldots ,x_{n} } \right]\). According to the high-order Lipschitz vector field, we get:
Remark 2
According to the Markov characteristic, when the steady state of the system takes different values, the smoothness of the high-order nonlinear system is different. The parameter solution of each state is unique and asymptotically stable everywhere. When the steady state value of the system is outside the safe set, the stability constant expression is meaningless.
Complex barrier function
The extended terms generated by the “windowing” processing of the planning time will cause "redundant terms" in the dynamic system. Therefore, it is very important to establish a complex barrier function in the high-order Lipschitz vector field dynamic system. The main design idea of the complex barrier function is to effectively avoid repeated motion and speed up the system operation mechanism by repeatedly setting the barrier function.
Consider a multi-dimensional high-order nonlinear safety system, namely \(\dot{x}\dot{y}\dot{n} = f\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)\). Based on the above, the barrier function is constructed. The basic idea of the barrier function is to construct a scalar function \(h\left( x \right)\) based on the structural information and safety constraints of the system. To detect the invariance of the safety set. Now for the high-dimensional state, let \(G\left( x \right) = - h\left( x \right)\). If:
where \(L_{f}\) is the Lyapunov function, which is a scalar function used to determine the stability of the system. The safety set at this time is a stable and unchanged state set. \(L_{f} \cdot G\left( x \right)\) represents the Lie derivative of \(G\left( x \right)\) along the vector field \(L_{f}\). The scalar function \(h\left( x \right)\) is the barrier function that needs to be constructed in the system. It plays a vital role in the safe operation of the system. The complex barrier function is denoted as \(ch\left( x \right)\). It satisfies condition (15).
Figure 2 is the calibration principle of the complex obstacle function. The yellow circle in the figure represents an obstacle. It can be represented as a circle with a center point of \(O_{i}\) and a radius of \(r_{i}\). \(O_{i} = \{ point \in {\mathbb{R}}^{n} \left| \right|\left| {point - O_{i} } \right|| \le r_{i} \}\). \(point\) is an arbitrary point and \(\left| {\left| \cdot \right|} \right|\) is the Euclidean distance. Safe Distance refers to the minimum distance that the robot must maintain from the obstacle during path planning. This minimum allowed distance can be represented as \(d_{safe}\). It is a preset constant. To ensure the reasonable size of \(d_{safe}\), we define an extended obstacle area \(O_{i}^{\prime }\). That is, the blue area. \(O_{i}^{\prime } = \{ point \in {\mathbb{R}}^{n} \left| \right|\left| {point - O_{i} } \right|| \le r_{i} + d_{safe} \}\). Then the complex obstacle function can be represented as \(ch\left( x \right) = \left\{ {\begin{array}{*{20}l} {\infty , {\text{if}} point \in O_{i}^{\prime } {\text{for any}} i} \hfill \\ \hfill \\ {0, {\text{otherwise}}} \hfill \\ \end{array} } \right.\). When the robot enters the extended obstacle area, a very large value (\(\infty\)) will be given, forcing the path planning algorithm to avoid these areas. The path pointed by the red arrow is the Infeasible path. It means that the robot will enter the safe distance range of the obstacle if it takes this path, resulting in a collision. If the original position of the robot is marked as an obstacle function, so that \(L_{f} \cdot G\left( x \right) > 0\), then the original position is an unsafe area. In the next cycle, the robot will automatically avoid this position to avoid repeated exploration of the original position. This reduces the data complexity of the next cycle.
Theorem 1
For system (6), different states start from the safe set S. After a series of planning tasks, they always stay inside the safe set S. Then the high-order system is safe.
In order to expand the complex obstacle function to high-dimensional space, we combine the 6-DOF robot dynamics equation to illustrate the applicability of the complex obstacle function in high-dimensional space. Given \({\mathbf{\mathcal{X}}}\left( q \right)\user2{\ddot{q}} + C_{rij} \left( {q,\dot{q}} \right) + G_{ri} \left( q \right) = \tau + h\left( {\dot{q}} \right)\). For the high-dimensional state \(q\), introduce the complex obstacle function \(ch_{q}^{ob} \left( q \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\frac{{d^{2} - \left| {\left| {q - q_{i} } \right|} \right|^{2} }}{{d^{2} }}} \right)\). \(q_{i}\) is the position of the i-th obstacle, and \(d\) is the safe distance between the robot and the obstacle. In order to ensure the effectiveness of the complex obstacle function in high-dimensional space, we incorporate it into the dynamics equation and define the penalty function \({\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right)\) to deal with the obstacle problem in high-dimensional space.
The complex obstacle function is introduced into the dynamic equation of the 6-DOF robot to obtain \({\mathbf{\mathcal{X}}}\left( q \right)\user2{\ddot{q}} + C_{rij} \left( {q,\dot{q}} \right)\dot{q} + G_{ri} \left( q \right) + h\left( {\dot{q}} \right) + {\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right)I_{ri} = \tau\).
\({\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right)\) is used to calibrate the complex obstacle function. \({\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right) = {\mathbf{\mathcal{P}}}^{{\mathbf{k}}} ch_{q}^{ob} \left( q \right)I_{ri}\). \({\mathbf{\mathcal{P}}}^{{\mathbf{k}}}\) is the penalty coefficient. Define the position and orientation angle of each joint in the system state vector \(q\). \(q = \left[ {q_{r1} ,q_{r2} , \ldots ,q_{r6} } \right]^{T} \to \left[ {x_{ddr}^{k} ,y_{ddr}^{k} ,z_{ddr}^{k} ,\theta_{ddr}^{x} ,\theta_{ddr}^{y} ,\theta_{ddr}^{z} } \right]^{T}\). The complex obstacle function can be expressed as \(ch_{q}^{ob} \left( q \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\frac{{d^{2} - \left| {\left| {q - q_{i} } \right|} \right|^{2} }}{{d^{2} }}} \right)\). The working mechanism is as follows. \(ch_{q}^{ob} \left( q \right)\) will change dynamically according to the distance between the robot and the obstacle. When \(d\) decreases, \(\left| {\left| {q - q_{i} } \right|} \right|\) decreases, causing \(ch_{q}^{ob} \left( q \right)\) to increase. Assuming that the position of the robot at a certain moment is \(q_{t}\) and the position of the obstacle is \(q_{s}\), it can be seen from \(ch_{q}^{ob} \left( q \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\frac{{d^{2} - \left| {\left| {q - q_{i} } \right|} \right|^{2} }}{{d^{2} }}} \right)\) that \(\left| {\left| {q_{t} - q_{s} } \right|} \right| \approx 0\). Causes \(ch_{q}^{ob} \left( q \right)\) to increase, thereby \({\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right)\) increases. Through this mechanism, we can ensure that the robot's path planning in high-dimensional space always avoids obstacles, thereby maintaining the safety and stability of the system.
The Markov high-order Lipschitz vector field dynamic system can be expressed as
where \({\mathbf{\mathcal{P}}}\left( {ch_{q}^{ob} \left( q \right)} \right)\) is the penalty function. It completes the conversion of the next cycle by adding the penalty function of the path that has been taken in the previous cycle. In summary, we can establish a related model system. Figure 3 shows the nonlinear control model system established based on robot behavior planning.
The system is a robot-based behavior planning and decision-making model. It enables the robot to move and make decisions in a complex environment through a multi-dimensional high-order nonlinear system and a complex obstacle function. The system uses a Markov decision model, which can analyze and evaluate the robot’s state and environment, and then make the best decision based on the evaluation results. In addition, the system also takes the existence of obstacles into account. It uses an obstacle function to restrict the robot's movement trajectory, so that the path planning can always stay within the safe set, that is, satisfy Theorem 1.
Classification LSTM
Long Short-Term Memory (LSTM)22 is a variant of Recurrent Neural Network (RNN) that is specifically designed to process and predict time series data. LSTM solves the gradient vanishing and gradient exploding problems of traditional RNN when processing long sequence data by introducing memory units and gating mechanisms. Therefore, LSTM performs well in processing long-term dependencies and complex nonlinear data. Its structural design diagram is shown in Fig. 4.
In Fig. 4, LSTM is mainly composed of input gate, forget gate, output gate and several cell states. There are four S-function units at the bottom. They control the conduction and cutoff of the gate by judging the data. The three on the right will pass through the transmission gate to determine whether the input is passed into the block. The second one on the left is the input gate. If the input is close to zero, it will not enter the next layer. The third one on the left is the forget gate. When the generated value is close to zero, the data in the block will be discarded. The rightmost is the output gate, which determines whether the gate in the block memory is output. The input gate calculates a value between 0 and 1 based on the current input and the hidden state of the previous moment through a sigmoid activation function, indicating which information needs to be retained. The forget gate determines which information in the memory cell needs to be forgotten. It also calculates a value through a sigmoid activation function to filter the memory cell state of the previous moment. In this way, LSTM can discard historical information that is no longer relevant and only retain important information related to the current path planning. The output gate controls the output information of the memory cell. It determines which information needs to be output to the hidden state of the next moment. However, a large amount of data is generated in each time period of multi-dimensional high-order system path planning, which requires traditional LSTM to store and update a large number of weights and intermediate states. Therefore, we designed a new LSTM (Classification LSTM) based on the high-order Lipschitz vector field. Figures 5 and 6 are the structural design diagram and application network model diagram of Classification LSTM respectively.
As shown in Fig. 5, the Classification LSTM is mainly composed of LSTM and Gaussian mixture model (GMM). The output \(h_{t}\) of LSTM is passed to the Gaussian mixture classification module. \(h_{t}\) is generated by the memory cell state and output gate control. It is output by setting the threshold \(\theta\) and the activation function \(y_{i}\). Then, multiple Gaussian distributions \(\left( {\omega_{1} ,\omega_{2} , \ldots ,\omega_{n} } \right)\) are generated after passing through the Gaussian mixture classification module, and each distribution corresponds to a category. This model combining LSTM and GMM can make full use of the time series modeling ability of LSTM and the classification ability of GMM, and is suitable for complex time series classification tasks. In path planning, data of different cycles have complex time series associations. Through Classification LSTM, these time dependencies can be better modeled. GMM can perform more refined classification based on the LSTM output. By generating multiple Gaussian distributions, it can more accurately correspond to different path planning decision categories.
In Fig. 6, the model starts from the starting point, initializes the necessary parameters and variables, and the data flow is transmitted through information simplification and information channels. In the input gate stage, LSTM decides which information to retain based on the current input and the hidden state of the previous moment, while the output gate controls the information to be transmitted to the next cycle. The information simplification module preprocesses the input data, removes redundant information, and improves processing efficiency. The model obtains the necessary cycle data from the external environment and trains the Classification LSTM through the training sample pool and training data. The first Classification LSTM module uses the time series modeling capability of LSTM and combines GMM for preliminary classification to generate multiple Gaussian distributions, each of which corresponds to a possible path planning decision. The second Classification LSTM module further processes and classifies the data to improve the accuracy and reliability of the decision.
Remark3
Set the receiving gate of the data to a four-tuple \(S_{p} = \left( {\rho ,\alpha ,\mu ,\tau } \right)\). The data accepted by each pole gate can be equivalent to a small Datebase. Set input to the incoming block, and the output of the input gate is not zero. The loss function is represented by Loss.
where \(\rho\) and \(\alpha\) are the input of the initial layer transfer gate respectively. \(Loss\left( {\rho /\alpha } \right)\) is the loss function of the initial layer. \(Qs_{p1} \left( {\rho ,\alpha } \right)\) represents the value tuple function after passing through the transfer gate. \(y_{i}\) is the initial value of the sigmoid function. Here, an initial value of the activation function is given to control data loss. \(f\left( {x_{i} , \theta } \right)\) represents the predicted data information in the unknown detection process. \(x_{i}\) is the true value of the data from the i-th layer transfer gate. \(\theta\) represents the threshold. The periodic data meets the Classification LSTM output terminal
Based on the above observations, with the help of LSTM to capture time dependency and GMM’s multimodal classification capabilities, the model can process periodic data in real time and adapt to dynamically changing environments, thereby generating more optimized paths. In the experimental section on path planning quality later, the reduction in the number of scalable nodes is closely related to Classification LSTM.
Deployment method of visual sensors in a limited area
Path marking backtracking strategy and dead zone path simplification are two important strategies of the deployment method.
Obtaining the information relationship between the robot system and the target position is the premise of studying the path marking backtracking strategy. The information relationship diagram is shown in Fig. 7.
The excitation function \(p\left( {x, y} \right)\) and the inhibition function \(q\left( {x, y} \right)\) are the key to obtaining information relationships. The calculation formula is as follows:
where \(x_{object}\) and \(y_{object}\) are the real-time positions of the robot. \(x_{hinder}\) and \(y_{hinder}\) are the coordinate information of the obstacle. \(x_{goal}\) and \(y_{goal}\) are the target coordinate information. \(M_{robot}\) is the mass of the robot. When the robot is close to the target position, \(p\left( {x, y} \right)\) increases. When the robot is close to the obstacle position, \(q\left( {x, y} \right)\) increases. Therefore, with each step, the excitation function and the inhibition function will continue to change.
Theorem 2
Single plane exploration displacement data set . Initial value of trajectory array \(start\left[ m \right]\left[ n \right]\). If the incremental position of the next trajectory \({\text{star}}t^{ + } \left[ m \right]\left[ n \right]\) is \(\left( {x, y} \right)\), the exploration direction is \(u = u + du\). At the same time, the sensor deployment element rules are met, then \({\text{star}}t^{ + } \left[ {m_{x} } \right]\left[ {n_{y} } \right] = 1\), that is, the path marking is completed. \(start\left[ m \right]\left[ n \right] = {\text{star}}t^{ + } \left[ m \right]\left[ n \right] - {\text{star}}t^{ + } \left[ {m_{x} } \right]\left[ {n_{y} } \right]\). The initial position displacement data set becomes , and the iteration is repeated until \({\mathcal{M}} = \emptyset\).
Based on Theorem 2, the expansion function in RRT is modified and optimized according to (20) and (21), as shown in Table 1.
In Table 1, \(M\) is the path planning environment. \(V\) is the vertex set. \(E\) is the edge set. \(x_{init}\) is the starting position. \(x_{goal}\) is the target position. \(\Gamma\) is the path. \(i\) is the number of iterations. \(ch\left( x \right)\) is the complex obstacle function. In this process, any node \(x_{rand}\) is randomly emitted from \(x_{init}\), and its closest distance \(x_{near}\) is found. Then \(x_{new}\) is the \({\text{Stepsize}}\) from \(x_{near}\) to \(x_{rand}\). If the activation function \(p\left( {x_{new} } \right)\) of the new node is greater than the activation function \(p\left( {x_{i - 1} } \right)\) of any previous node, then the new node is expanded into the set. \({\text{CollisionFree}}\left( {M^{\prime},{ }E_{i} } \right)\) is used to monitor obstacles in \({\text{Edge}}\left( {x_{new} , x_{near} , ch\left( x \right)} \right)\).
Completing the local full traversal of the current segment based on systems (20) and (21) and Theorem 2 is the key to optimizing the deployment. First, set the planning environment to be divided into several unit areas. The side length of the unit area is much smaller than the set sensor detectable range. The set number of sensors must be greater than the number of path marking positions. The set of visual sensors is represented by \(Sensor = \left\{ {Sr_{1} , Sr_{2} , \ldots , Sr_{n} } \right\}\). The deployment element of each sensor is represented by \(element = \left\{ {ep_{1} ,ep_{2} , \ldots ,ep_{n} } \right\}\). When the planned position satisfies the deployment element of a certain sensor, the path is marked. For each step forward, if Theorem 2 is satisfied, it is iterated and updated repeatedly until the optimal planning route is found. We can simulate this process through a simple node model.
The node model diagrams in Fig. 8a,b describe the implementation process of the path marking backtracking strategy. In (a), 14 nodes represent key positions in path planning, of which 11 blue nodes represent marked path positions and 3 yellow nodes represent path positions to be marked. Arrows represent the association between nodes. For example, node 1 points to node 4 and then to node 7, indicating the path direction of the robot. In (b), the color of some nodes changes, with nodes 1 and 4 turning red and node 7 turning green, indicating the application of the path marking backtracking strategy. At the same time, the path from 1 to 4 and then to 7 disappears, indicating that the path planning has been updated according to the data to avoid the inaccessible area.
Dead zone path simplification is to remove the "closed dead zone". It will further realize the partial path simplification after the path marking backtracking strategy. When there is no sensor deployment at the planned location, that is, there is no associated sensor deployment element, then the planned location is a dead zone location. The specific process is shown in Table 2.
Based on the above information, we assume that there are \(N_{s}\) visual sensors, and the position of each sensor \({\text{s}}_{{\text{i}}}\) is represented by \(\left( {{\text{x}}_{{{\text{s}}_{{\text{i}}} }} ,{\text{y}}_{{{\text{s}}_{{\text{i}}} }} ,{\text{z}}_{{{\text{s}}_{{\text{i}}} }} } \right)\). Its coverage is determined by the radius \(r_{i}\). Then the deployment set of sensors is represented by \({\mathcal{S}} = \left\{ {\left( {x_{{s_{i} }} ,y_{{s_{i} }} ,z_{{s_{i} }} } \right),\left( {x_{{s_{i} }} ,y_{{s_{i} }} ,z_{{s_{i} }} } \right), \ldots ,\left( {x_{{s_{i} }} ,y_{{s_{i} }} ,z_{{s_{i} }} } \right)} \right\}\). Assume that the map is divided into \(M \times N \times L\) volume units. Use a three-dimensional array \({\mathbb{M}}\) to represent the map, and its element \(m_{i,j,k}\) represents the state at the volume unit \(\left( {x_{i} ,y_{j} ,z_{k} } \right)\). Then map information acquisition can be represented by \({\mathbb{M}}\).
where \(M,N\) and \(L\) are the resolutions of the map in the \(x,y\) and \(z\) directions respectively. The state here can be 1 for obstacles or 0 for no obstacles.
Define \(S\left( {x,y,z} \right)\) to represent the area covered by the sensor. For each volume unit \(\left( {x_{i} ,y_{j} ,z_{k} } \right)\) in the map, it is expressed as follows:
where \(x_{sl} ,y_{sl}\) and \(z_{sl}\) represent the coordinates of the lth sensor. \(r_{l}\) is the coverage radius of the lth sensor. \({\text{max}}\) represents the maximum value operator, which is used to determine whether the sensor covers the volume unit.
Remark4
The goal of sensor deployment is to cover as much area as possible on the map to obtain comprehensive environmental information. We achieve this goal by optimizing the location and coverage of sensors. Map information acquisition is to collect environmental data through sensors and convert it into map state. This process includes the following steps.
-
(1)
Sensors sense the surrounding environment and collect data at various locations.
-
(2)
The collected data needs to be processed and analyzed to extract useful information.
-
(3)
Based on the processed data, we can build a map to represent the state of the environment. The map can be two-dimensional or three-dimensional, where each cell or volume unit corresponds to an area in the environment. Set the three-dimensional space to \({\mathbb{R}}^{3}\). The known map state matrix is \({\mathbb{M}}\). Then update the state matrix based on the results of sensor data processing. \({\mathbb{M}} = g\left( {D,m_{i,j,k}^{{\left( {t - 1} \right)}} } \right)\). Among them, \(g\) is the state update function, and \(m_{i,j,k}^{{\left( {t - 1} \right)}}\) is the map state in the previous cycle. \(D\) is the data processing result of the sensor.
-
(4)
The state of the map needs to be updated over time to reflect the dynamic changes of the environment. Sensors continuously sense the environment and update the map state to ensure the timeliness and accuracy of map information. During the update cycle with a time step of \(\Delta t\), the dynamic changes of the environment can be modeled by the state transition function. The state transition function is \({\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right) = m_{i,j,k}^{\left( t \right)}\). \({\mathcal{L}}\) combines the state of the previous time step and the sensor data of the current time step to update the current state. Then the dynamic update of the map state can be expressed as \(m_{i,j,k}^{\left( t \right)} = {\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,S\left( {x_{{s_{i} }} ,y_{{s_{j} }} ,z_{{s_{k} }} } \right),D^{\left( t \right)} } \right)\). It provides a detailed description of the sensor coverage area and the dynamic update of the map state. Through these descriptions, we can accurately deploy sensors and acquire and update map information.
To better describe the deployment process, we simulated the environment information with a map size of \(200 \times 200\). The resolution was 1.40 dynamically changing obstacles were simulated 6 times, as shown in Fig. 9. The starting point and the target point were set at (20, 20) and (180, 180), respectively, represented by blue and red dots. 30 visual sensors were deployed, and the coverage radius of the sensor was 20. If a point in the environment is covered by a sensor, the point is marked as a backtracking point. These points are backtracked in the final path and are not actually walked through.
Experiment analysis
To verify the feasibility of the theory, we verify it from three aspects: the quality of path planning in a two-dimensional static environment, the quality of path planning in a three-dimensional dynamic environment, and the effect of data prediction. The methods we compare include Traditional RRT, Extended_RRT, Goal-bias RRT, RRT*, Dynamic weighted RRT algorithm (DW-RRT)23 and BP-RRT*24.
Path planning quality in 2D static environment
We compared IRPA (RRT planning for the PTM algorithm) with Extended_RRT, Goal-bias RRT, and RRT*. Planning time, response cycle, and number of extended nodes were used as evaluation indicators. The planning environment size was 50 × 30. Static planning was adopted. The coverage radius of the sensor was set to 5, and each algorithm was run 10 times. PR is used to store the coverage information of the sensor position in each run. That is, \(PR = \left( {\begin{array}{*{20}c} {Sensor_{11} } & \cdots & {Sensor_{1n} } \\ \vdots & \ddots & \vdots \\ {Sensor_{n1} } & \cdots & {Sensor_{nn} } \\ \end{array} } \right)\). Then the update of \(Sensor_{ij}\) can be regarded as the weight change of the local Lipschitz. \(w_{i} \leftarrow Sensor_{ij}\). The planning time can be directly obtained through software testing. The test method for the number of extended nodes and response cycle is shown as follows.
-
(1)
Number of extended nodes. During each run, when the algorithm generates a path from the starting point to the target point, each new node generated is counted once. The number of extended nodes is the number of all nodes generated by the algorithm before finding a valid path. When the algorithm termination condition is reached, the total number of extended nodes is recorded as \(N_{nodes}\). Given \(Output Date = \mathop \sum \limits_{i = 1}^{4} (y_{i} \left( {Qs_{p1} \left( {\rho ,\alpha } \right)} \right) - \theta )\) and \(w_{i} \leftarrow Sensor_{ij}\). Then, when the algorithm is not completely terminated, the generation of new nodes causes the value tuple function of \(Qs_{p1} \left( {\rho ,\alpha } \right)\) to be lost after passing through the transfer gate. That is, \(Qs_{p1} \left( {\rho ,\alpha } \right)|_{t + 1} < Qs_{p1} \left( {\rho ,\alpha } \right)|_{t}\). Then the output gate will generate fewer nodes. That is, \(N_{nodes} |_{t + 1} < N_{nodes} |_{t}\). After repeated iterations, take the average expansion node of 10 times. \(\overline{{N_{nodes} }} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} N_{i}\).
-
(2)
During each run, record the time interval of each step as \(\Delta {\text{t}}\). The response cycle is the time required for the algorithm to generate the next state \(\left( {P_{state} |_{t} } \right)\) from the current state \(\left( {P_{state} |_{t + 1} } \right)\). In each run, calculate the average of all time intervals. That is, \(T_{cycle}\). When the algorithm termination condition is reached, record the average response cycle. Calculate the average response cycle of 10 times. \(\overline{{T_{cycle} }} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} T_{cycle}^{\left( i \right)}\). \(T_{cycle} = \frac{1}{{N_{nodes} }}\mathop \sum \limits_{i = 1}^{{N_{nodes} }} \Delta t_{i}\). \(T_{cycle}^{\left( i \right)}\) is the average response cycle of the i-th run. \({\text{n}} = 10\).
The comparison results of path planning quality are shown in Fig. 10.
In Fig. 10, the blue and green blocks are the starting and ending points. The red curve is the planned route. The green curve is the extended node. The gray part is the obstacle position. (a) It uses the information of the initial path instead of completely randomly scattering points. It has disadvantages such as excessive generation of branches. (b) It has both excessive branches and drawbacks such as falling into local optimal solutions. However, compared with (a), the search time is shorter and the number of extended nodes is less. (c) Compared with the former, the planned path is significantly shortened, but the response cycle is long and the number of extended nodes is large. In addition, it also has problems such as over-optimization and parameter dependence. (d) While ensuring the completeness of the search path, it achieves the effect of fewer extended nodes and shorter search time. The parameter comparison of the four methods is shown in Table 3.
Table 3 shows that the planning time, response cycle and number of expanded nodes generated by IPRA are better than those of the other methods. Among them, the planning time is reduced by 76.4% of the original. The response time is reduced by 82.3% of the original. The number of expandable nodes is reduced by 86.7% of the original.
As known from Remark 4, the state of the map needs to be updated over time. While ensuring the accuracy of the map information, the state transition function is modeled with a time step of \(\Delta t\). The state transition function is expressed as \({\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right) = m_{i,j,k}^{\left( t \right)}\). Then the number of times \({\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right)\) is updated with \(\Delta t\) represents the number of iterations of the algorithm to search for the optimal path. In continuous iterations, the iteration error will change with the change of \({\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right)\). The iteration error refers to the gap between the generated path and the optimal path during the path planning process. It is similar to the discrete error in12. This error can be expressed as \(Path_{er} = \left| {{\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right)_{\Delta t} - {\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right)_{{\Delta t^{ - } }} } \right|\) based on \({\mathcal{L}}\left( {m_{i,j,k}^{{\left( {t - 1} \right)}} ,D^{\left( t \right)} } \right) = m_{i,j,k}^{\left( t \right)}\). \(\Delta t^{ - }\) is the previous search step relative to \(\Delta t\). In IRPA, the initial step size is set to 1 and the error threshold is set to 0.01. The target bias probability of traditional RRT is set to 0.1. Initialize the high-order Lipschitz vector field parameters and the Classification LSTM model. In each iteration, the path is adjusted according to the error value, and the Classification LSTM is used to filter the data. When the error is lower than the threshold, the search path is terminated and the search iteration is stopped. Figure 11 shows this iterative process.
In Fig. 11, the number of iterations of IRPA is about 2400, and the error at this time is less than 0.01. There was also a situation where the error was less than 0.01 before 2400, but the adjustment of the nodes proved that it was not the optimal path at this time. (a) The figure shows staggered spike changes. This shows that in each iteration, there is a large uncertainty and fluctuation in the position of the new node generated by the algorithm. This is because IRPA will continuously optimize and adjust during the path generation process, resulting in a large change in the error. The traditional algorithm still did not converge after nearly 20,000 iterations. Because the traditional RRT uses a fixed step size and rules in the path generation process, the error changes are relatively stable. Therefore, (b) shows a smooth blue line.
Path planning quality in 3D dynamic environments
In order to make the path planning more applicable to general Differential Drive Robots (DDR), we simulated and tested the method through a three-dimensional scene. The map environment also changed from static to dynamic. This dynamic change simulates the actual environment more realistically. The map size is set to \(20 \times 20 \times 10\). The number of sensors is set to 10. The algorithm step size in the RRT expansion function is set to 0.2. The backtracking step size is 0.2. In the LSTM output, the output state in each step is represented as \(h_{t} = \alpha \left( {W_{h} \cdot \left[ {h_{t - 1} ,x_{t} } \right] + \theta } \right)\). \(x_{t}\) is the input data, and \(W_{h}\) is the weight. \(h_{t - 1}\) is the output state of the previous step. Then the parameters in \(Output Date = \mathop \sum \limits_{i = 1}^{4} (y_{i} \left( {Qs_{p1} \left( {\rho ,\alpha } \right)} \right) - \theta )\) change due to the update of \(h_{t}\). By adjusting \(W_{h}\), \(h_{t}\) and \(Output Date\) change until all the initial position data sets are empty. That is, \({\mathcal{M}} = \emptyset\). At the same time, \(Qs_{p1} \left( {\rho ,\alpha } \right)|_{t + 1} < Qs_{p1} \left( {\rho ,\alpha } \right)|_{t}\). The number of node expansions that occur each time \(h_{t}\) is adjusted is different. Figure 12 simulates the path planning test in a three-dimensional dynamic environment.
Figure 12 simulates two debugging of \(W_{h}\). They are denoted as \(W_{h1}\) and \(W_{h2}\) respectively. The number of nodes expanded in \(W_{h1}\) is 569, and the number of nodes expanded in \(W_{h2}\) is 272. The environment is dynamically changed based on the debugging parameters of \(W_{h1}\) and \(W_{h2}\). In \(W_{h1}\), the number of expanded nodes is close to Extended_RRT, and \({\mathcal{M}}\) can never search for empty data, and the loop continues. Therefore, the planned path mistakenly regards obstacles as feasible paths, resulting in collisions. In \(W_{h2}\), we did an interesting test. \(Qs_{p1} \left( {\rho ,\alpha } \right)|_{t + 1} < Qs_{p1} \left( {\rho ,\alpha } \right)|_{t}\) and \(Qs_{p1} \left( {\rho ,\alpha } \right)|_{t + 1} > Qs_{p1} \left( {\rho ,\alpha } \right)|_{t}\) were set respectively. The optimal design path and the collision path were obtained. It can be seen that the initial position data is crucial for the search conditions of the expanded nodes in a three-dimensional state. Figure 13 tests this dynamically changing environment.
In Fig. 13, A is a laser radar, which is used to scan environmental information. B is a gyroscope, which is used to determine the angle of the avoidance system, such as the deflection angle, pitch angle and yaw angle. C is a router, which is used for remote debugging. D is a CPU, which is used to receive intelligent input algorithms and is the core of the avoidance system. E is a single-chip microcomputer, which can control the drive of the entire system as a control system.
Data prediction effect
In path planning, the map data in each time step must be predicted to determine whether it is the optimal node or obstacle. Therefore, the data prediction effect is particularly important. We compared IRPA with DW-RRT and BP-RRT* for data prediction. DW-RRT23 is an improved RRT algorithm based on dynamic weighted maps recently proposed by Zhao and Wu. This algorithm can predict the path where collision may occur in advance when the search path is complete, effectively improving the efficiency of path planning. BP-RRT*24 is a path planning algorithm based on BP neural network and improved rapidly expanding random tree proposed by Gao and Yuan et al. However, due to the discretization of obstacles and distance weight functions, the stability of planning is limited to a certain extent.
In this experiment, we take the difference between the actual and observed node expansion in each time step as the ideal data. When the predicted value is higher than the actual observed value, the prediction error is positive, that is, \(\left| {Pre_{i} - Act_{i} } \right| > 0\); conversely, when the predicted value is lower than the actual observed value, the prediction error is negative, that is, \(\left| {Pre_{i} - Act_{i} } \right| < 0\). In Markov’s nonlinear analysis parameters, \(\Delta t = 0.1s\). The number of layers of the LSTM network in the Classification LSTM is 4. Each layer contains 100 neurons. The training data contains 100, 150 and 200 data samples respectively. The characteristic dimension of the input data is 10. The backtracking step is 0.2. Then each data sample is used as the action state of the system (14). That is, \(x = \left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)^{T} \epsilon R^{n} \to x = \left( {date_{d1} ,date_{d2} , \ldots ,date_{dn} } \right)^{T} \epsilon R^{n}\). \(f\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right) \to f\left( {date_{d1} ,date_{d2} , \ldots ,date_{dn} } \right)\) is a multidimensional local Lipschitz continuous vector field. The weights assigned to each layer of LSTM are set to \(\omega_{lstm}^{1} = 0.4,\omega_{lstm}^{2} = 0.2,\omega_{lstm}^{3} = 0.2,\omega_{lstm}^{4} = 0.2\). \(p\left( {x_{new} } \right)\) in the prediction always satisfies \(p\left( {x_{new} } \right) > \forall p\left( {x_{i - 1} } \right)\). Regardless of \(\left| {Pre_{i} - Act_{i} } \right| > 0\) or \(\left| {Pre_{i} - Act_{i} } \right| < 0\), we construct an evaluation function about \(p\left( {x_{new} } \right)\).
where \(L_{path}\) is the path length. \(C_{collsions}\) is the number of extension points collected by \(CollisionFree\left( {M^{\prime } ,{ }E_{i} } \right)\). \(T_{exec}\) is the execution time of each layer of LSTM. \(E_{pred}\) is the prediction error. \({\text{n}} = 100,150,200\). It is ensured that \(CollisionFree\left( {M^{\prime } ,{ }E_{i} } \right)\) always has a quantitative advantage in any prediction. That is, \(Num\left( {CollisionFree\left( {M^{\prime } ,{ }E_{i} } \right)} \right) > Num\left( {Edge\left( {x_{new} , x_{near} , ch\left( x \right)} \right)} \right)\).
In Fig. 14, IRPA has good curve fit and difference stability with the ideal search data, indicating that its prediction error is small. Although DW-RRT and BP-RRT* also follow the ideal data, they fluctuate greatly, indicating that the prediction error is relatively high. In the iteration of 100 data samples, the prediction error of IRPA is significantly lower than that of DW-RRT and BP-RRT*. In the iteration of 150 and 200 data samples, IRPA still maintains high accuracy with small error fluctuations.
Conclusion
This paper studies an improved RRT behavior planning method for robots based on the PTM algorithm. A new solution is proposed for the problem of unstable path quality in multi-dimensional high-order nonlinear systems. We designed a high-order Lipschitz vector field dynamic system, a Classification LSTM, and a method for limited area sensor deployment. The high-order Lipschitz vector field dynamic system can adapt to the high-order dynamic characteristics of multi-degree-of-freedom robots and demonstrates excellent accuracy in path planning. The Classification LSTM ensures the global stability of data in path design by screening the training data model. The path marking backtracking strategy and dead zone path simplification method significantly improve the efficiency of path planning by deploying visual sensors in a limited area. This method effectively covers key areas in the environment and achieves better path planning results. Finally, this paper compares IRPA with other algorithms through simulation experiments in two-dimensional and three-dimensional environments to verify the superiority of IRPA.
Although the limited area sensor deployment strategy shows significant effects on path marking backtracking and dead zone path simplification, its effectiveness is limited in closed environments, such as the path design inside the box of an automated loading and unloading truck25. In the future, we will strengthen the design of sensor deployment in more complex environments, such as a sensor array data monitoring method based on DLM Mask R-CNN that we recently studied. This will provide new ideas and solutions for path search.
Data availability
The datasets generated and/or analysed during the current study are available in the Zenodo repository, https://zenodo.org/records/12730125.
Abbreviations
- PTM:
-
Path tracking method
- RRT:
-
Rapidly-exploring random tree
- LSTM:
-
Long short-term memory
References
Yao, Z. et al. ReinforcedRimJump: Tangent-based shortest-path planning for two-dimensional maps. IEEE Trans. Industr. Inf. 16(2), 949–958 (2019).
Jan, G. E., Sun, C. C., Tsai, W. C. & Lin, T. H. An O(n log n) shortest path algorithm based on delaunay triangulation. IEEE/ASME Trans. Mech. 19(2), 660–666 (2013).
Mayya, S., Pierpaoli, P., Nair, G. & Egerstedt, M. Localization in densely packed swarms using interrobot collisions as a sensing modality. IEEE Trans. Rob. 35(1), 21–34 (2018).
Li, B. & Chen, B. An adaptive rapidly-exploring random tree. IEEE/CAA J. Autom. Sin. 9(2), 283–294 (2021).
Moon, C. & Chung, W. Kinodynamic planner dual-tree RRT (DT-RRT) for two-wheeled mobile robots using the rapidly exploring random tree. IEEE Trans. Industr. Electron. 62(2), 1080–1090 (2014).
Saccuti, A., Monica, R. & Aleotti, J. PROTAMP-RRT: A probabilistic integrated task and motion planner based on RRT. IEEE Robot. Autom. Lett. 8(12), 8398–8405 (2023).
Lim, W. et al. Hybrid trajectory planning for autonomous driving in on-road dynamic scenarios. IEEE Trans. Intell. Transp. Syst. 22(1), 341–355 (2019).
Zhang, X., Zhang, R. & Wang, X. Visual slam mapping based on yolov5 in dynamic scenes. Appl. Sci. 12(22), 11548 (2022).
Rodriguez, S., Tang, X. & Lien, J. M., et al. An obstacle-based rapidly-exploring random tree. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006. 895–900 (IEEE, 2006).
Lan, W. et al. Improved RRT algorithms to solve path planning of multi-glider in time-varying ocean currents. IEEE Access 9, 158098–158115 (2021).
An, B., Kim, J. & Park, F. C. An adaptive stepsize RRT planning algorithm for open-chain robots. IEEE Robot. Autom. Lett. 3(1), 312–319 (2017).
Hoy, M., Matveev, A. S. & Savkin, A. V. Algorithms for collision-free navigation of mobile robots in complex cluttered environments: Asurvey. Robotica 33(3), 463–497 (2015).
Bianco, C. G. L. Minimum-jerk velocity planning for mobile robot applications. IEEE Trans. Robot. 29(5), 1317–1326 (2013).
Zhou, B., Pan, J., Gao, F. & Shen, S. Raptor: Robust and perceptionaware trajectory replanning for quadrotor fast flight. IEEE Trans. Robot. 37(6), 1992–2009 (2021).
Nilles, Q., Ren, Y., Becerra, I. & LaValle, S. M. A visibilitybased approach to computing non-deterministic bouncing strategies. Int. J. Robot. Res. 40(10–11), 1196–1211 (2021).
Kumar, R. & Moore, J. B. Convergence of adaptive minimum variance algorithms via weighting coefficient selection. IEEE Trans. Autom. Control 27(1), 146–153 (1982).
Khalil, H. K. Nonlinear Systems 3rd edn. (Prentice-Hall, 2002).
Karafyllis, I. & Krstic, M. Adaptive certainty-equivalence control with regulation-triggered finite-time least-squares identification. IEEE Trans. Autom. Control 63(10), 3261–3275 (2018).
Jiang, C. et al. A fast and high-performance object proposal method for vision sensors: Application to object detection. IEEE Sens. J. 22(10), 9543–9557 (2022).
Sheffield, S. & Smart, C. K. Vector-valued optimal Lipschitz extensions. Commun. Pure Appl. Math. 65(1), 128–154 (2012).
Mor, B., Garhwal, S. & Kumar, A. A systematic review of hidden Markov models and their applications. Arch. Comput. Methods Eng. 28, 1429–1448 (2021).
Lindemann, B. et al. A survey on anomaly detection for technical systems using LSTM networks. Comput. Ind. 131, 103498 (2021).
Zhao, B. et al. Research on hybrid navigation algorithm and multi-objective cooperative planning method for electric inspection robot. Energy Rep. 9, 805–813 (2023).
Gao, Q. et al. Path planning algorithm of robot arm based on improved RRT* and BP neural network algorithm. J. King Saud Univ.-Comput. Inf. Sci. 35(8), 101650 (2023).
Luo, J., Wu, Y. & Mendes, A. B. Modelling of integrated vehicle scheduling and container storage problems in unloading process at an automated container terminal. Comput. Ind. Eng. 94, 32–44 (2016).
Acknowledgements
This work was completed while the author was studying for a master's degree in the School of Information and Electronic Engineering at Shandong Technology and Business University. The work was funded by the project of Yantai City Science and Technology Innovation to help the transformation of old and new kinetic energy. Fund: KXDNY2023-25 (Corresponding author: Zuoxun Wang).
Author information
Authors and Affiliations
Contributions
C. and W. Preparated of laboratory stand and wrote the main manuscript text. S. and Z. finished Simulation debugging. G. finished manuscript review edition. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cui, C., Wang, Z., Sui, J. et al. An improved RRT behavioral planning method for robots based on PTM algorithm. Sci Rep 14, 21776 (2024). https://doi.org/10.1038/s41598-024-72616-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-72616-4