A fully probabilistic control framework for stochastic systems with input and state delay

This paper proposes a unified probabilistic control framework for a class of stochastic systems with both control input and state time delays. Both of the stochastic nature and time delays in the system dynamics are considered simultaneously, thus providing a comprehensive and rigorous control methodology. The problem is formulated in a fully probabilistic framework, where the system dynamics and its controller are fully characterised by arbitrary probabilistic models. In this framework, the Kullback–Leibler Divergence between the actual joint probability density function of the system dynamics and controller and a predefined ideal joint probability density function is used to characterise the discrepancy between the two distributions and derive the randomised controller. Time delays in the control input and system state are taken into consideration in the optimisation process for the derivation of the optimal randomised controller. Besides, the analytic control solution of the time delay fully probabilistic control problem for a class of linear Gaussian stochastic systems is derived while the successive approximation approach is implemented to deal with the time-advanced components in the control law that result from the existence of time delays. The effectiveness of the proposed control framework is then illustrated on a numerical example and a real-world example.


Review of current approaches
Time delays, which normally appear as transportation and communication lags and also arise as feedback delay in measurement and closed-loop systems, are commonly encountered in real-life practical systems including engineering, chemistry, biology, climatology, and economical systems [13][14][15][16] . Controlling and understanding timedelay systems have always been very challenging. For these systems, it is important that time delays are incorporated in the systems models and that they are considered in the theoritical and control analysis of the closed loop system behaviour 17 . Thus, a large and growing body of research on time delays and their compensation methods has been investigated and published. For instance, a graphical methodology to calculate the stabilizing values of PI controller parameters for a single-area load frequency control (LFC) system with time delay was proposed in 18 . In 16 , a sliding mode control design for fractional-order systems with input and state time-delay is presented. A delay discretization approach is introduced in 19 to improve the tolerable state delay margin of the interconnected power system while 20 investigated the event-triggered drive-response synchronization control for Takagi-Sugeno fuzzy neural networked systems with time delay. Some current results related to the control problem of time-delay systems have been summarised in 13,21,22 .
Despite all these significant efforts that have been investigated, most of the existing research has only considered either input time delay or state time delay in their models but not both 17,20,23 , thus limiting their implementation to real life situations. Moreover, a rich group of literature tried to solve the time delay issue by transforming discrete-time systems with time-delays into delay free systems by properly defining new state variables 23,37 . This approach however significantly increases the dimensionality of the controlled system, thus complicates the analysis of systems with long time delays and makes large-scale and high dimensional systems numerically demanding. Recent work has also addressed non-constant time delays where the time delays are considered to vary stochastically [24][25][26][27] . Most of the aforementioned approaches on the other hand were presented under the assumption that the dynamics of the systems being considered follow a deterministic description. Such an assumption however is not realistic since real-world processes are usually subjected to various sources of uncertainties including random noises, functional uncertainties, and disturbances introduced by measurement devices and other surrounding environmental conditions. For such systems which involve both high levels of uncertainties and input and/or state delays, the controller design becomes much more challenging.
To address the stochasticity of the systems dynamics, a new method which is based on the use of the Kullback Leibler divergence between probability density functions is proposed in 28,29 . This method is referred to as the fully probabilistic design (FPD) method. The conventional FPD method has then been applied and extended to various classes of stochastic systems in recent decades. For instance, in 30 , the conventional FPD has been modified and extended for a class of stochastic dynamic systems with multiplicative noises while 31 has combined FPD method with disturbance observer based controller. For systems with delays, the work in 32 extended the conventional FPD and proposed a Time Delay Fully Probabilistic Design (TDFPD) method for a class of stochastic systems with input delays. However, the TDFPD method proposed in 32 only considered a single input delay in the system dynamics, which is very limiting for real-world applications.
As such, the objective of the current paper is to extend the method in 32 by developing a probabilistic control framework for a class of stochastic systems that has multiple input and state time delays. Considering the stochastic nature of the class of systems under study, the proposed methodology characterises the dynamics of the system using probabilistic models. The framework then adopt and extend the probabilistic design method 32 such that multiple state and input time delays are taken into consideration in the derivation of the optimal controller. Following this approach the optimal controller will be a randomised controller that minimizes the Kullback-Leibler divergence between the joint probability density function (pdf) of the system dynamics and a predefined desired joint pdf. As will be seen from further development, the derived fully probabilistic control solution in this paper (which will be referred to as the Multiple Time Delay Fully Probabilistic Control (MTDFPC)) has several advantages including the attainment of a closed form randomised optimal controllers, the consideration of systems noises and uncertainties in the system of dynamics, and the consideration of systems multiple time delays; all in a unified probabilistic framework. Due to the existence of the input and state delays, the obtained randomised optimal solutions contain both time-delay and time-advance terms, which is difficult to be solved analytically. To address this problem, we adopt the successive approximation approach (SAA) 33 to solve the control problem. The advantage of this method is that it is suitable not only for small-time delays but also for large-time delays.
To re-emphasise, compared with the existing results on this topic, the contribution of the proposed framework in this paper can be summarised as follows. Firstly, considering the stochastic nature of the systems dynamics, a fully probabilistic control framework is developed which considers the uncertainties and noises in the system dynamics as well as the multiple time delays in the control input and system state. Unlike most of the existing literature where the system dynamics are described by deterministic equations, in our framework, the system dynamics are completely characteried by pdfs. Secondly, this framework takes both multiple input delays and multiple state delays into consideration, extending the TDFPD 32 that only considers one single type of delay. The consideration of both input and state delays in the system models offers a more general and precise description of the real-world system dynamics. This is considered as the main contribution of this paper as only few existing proposed control algorithms considered both multiple input delays and multiple state delays. Thirdly, a numerical optimal solution can be obtained using the SAA, which provides an explicit control procedure to follow and to Control objectives of the MTDFPC problem. In the fully probabilistic control design method 28 the aim is to design a randomised controller that shapes the joint pdf describing the closed loop behaviour of the controlled system and makes it as close as possible to a predefined ideal joint pdf. In this method, the discrepancy between the two joint pdfs is measured by the Kullback Leibler divergence. The FPD method however insists on zero delay between the input and the system state, thus, does not provide optimal solutions for systems with delays. As such, this method will be extended in this section to consider stochastic systems with multiple control input and state delays.
Consider the following probabilistic description for the considered class of stochastic systems with multiple state and control input delays that can be represented at each time instant, t by the following conditional pdf, where x t ∈ R n represents the system state, u t ∈ R m represents the control input, and h i , i = {1, 2, . . . , N 1 } and L j , j = {1, 2, . . . , N 2 } denote time delays in the state and input respectively. Also, assume that, h = max For the formulation in this paper s(.|.) does not need to be known and does not need to be constrained by the Gaussian assumption. Also, note that because of the stochastic nature of the considered class of systems with time delays, the probabilistic description of the system dynamics as given in (1) provides a complete specification of the present state conditioned on the previous state and present and previous control. To reemphasise, the probabilistic description (1) is general and can be characterised from the underlying stochastic evolution of the system dynamics. The formulation in this section will be based on this general probabilistic description. The results obtained here will then be demonstrated in the following sections on a class of stochastic linear time delay systems with additive Gaussian noise. However, this formulation is not restricted by the assumption of the additive noise nor it is restricted by the linearity of the system. The noise could be multiplicative and the system equation could be nonlinear.
For these stochastic systems, the closed loop behaviour of the system dynamics can be specified by the joint probability density function of the system state and control input. This joint pdf of the closed loop dynamics of the system provides the most complete description of its behaviour. As such, similar to the conventional FPD approach, the objective of the MTDFPC control problem is specified as the design of a randomised controller, c(u t |x t−1 , x t−h 1 , . . . , x t−h N 1 , u t−L 1 , . . . , u t−L N 2 ) that minimises the Kullback-Leibler divergence between the joint pdf of the closed loop description of the system dynamics, f (X (t, T)) and a predefined ideal joint pdf where X (t, T) = {x t , . . . , x T , u t , . . . , u T } is the closed loop observed data sequence, and T ≤ ∞ is a given control horizon. For stochastic systems with multiple input and state delays given in (1), the joint pdf of the system dynamics, f (X (t, T)) can be evaluated using the chain rule 34 as follows, represents the pdf of the required randomised controller as mentioned earlier.
Similarly, the ideal joint pdf of the closed loop data can be factorised as follows, describes the ideal distribution of the system state vector x t , and I c(u t |x t−1 , represents the ideal pdf of the randomised www.nature.com/scientificreports/ controller. Given the definitions of the joint pdf of the closed loop system and the ideal pdf as specified by Eqs. (3) and (4) respectively, minimisation of (2) can be obtained recursively by introducing the following definition, The definition in Eq. (5) leads to the recursive formula for the cost function specified in the following theorem. This recursive formula will be used later for the derivation of the optimal randomised controller.
can be shown to be given by the following theorem.

Theorem 2
The pdf of the optimal randomised controller minimising the cost-to-go function (6) subject to the conditional distribution of the stochastic system, s(

Solution to the MTDFPC for Gaussian probabilistic state space models
Theorem 2 provides the general solution for the considered class of stochastic systems with multiple input and state delays that can be described by arbitrary pdfs. This general solution however needs to be evaluated numerically if the systems distributions contain non-linearity and non-Gaussianity. Nonetheless, the Computation of the solution numerically yields high computational costs that increase with the complexity and dimensionality of the problem. Therefore, to facilitate the understanding and the derivation of an analytical solution for the proposed probabilistic control framework, the solution stated in Theorem 2 will be applied here to a class of linear and Gaussian, stochastic dynamical systems that are also affected by multiple input and state delays. This class of linear stochastic systems that are driven by multiple input and state delays is described by, where A is the system state matrix, B is the control input matrix, A i , i = 1, ..., N 1 represent the matrices of delayed system state, and B j , j = 1, ..., N 2 denote the matrices of delayed input. In addition, h i , i = {1, 2, . . . , N 1 } and L j , j = {1, 2, . . . , N 2 } are time delays in the state and input as discussed before. Moreover, ε t is a zero mean Gaussian noise with covariance Q. As discussed earlier, the effect of the noise, ε t on the system state x t means that complete specification of the system state can be only achieved through its probability distribution conditioned on the current control input and previous state and control input. For the class of linear systems given in (11) the generative probabilistic model of the system state can be characterised by a Gaussian distribution as follows, B j u t−L j is the mean of the system state at time t.
As discussed in the previous section, the control objective within the MTDFPC framework can be achieved by specifying the appropriate parameters of the ideal distribution that reflects the desired objective. In this paper, a tracking problem where the controller is designed to make the state of the system given by (11) follows a predefined reference state is considered. Thus, for the probabilistic description of the system given in (12) the ideal distribution of the system state is taken to have the following form, where x r denotes the predefined reference state for the system state to track, and R is the ideal covariance determining the spread of the state values around the desired reference state.
Similarly, the ideal distribution of the controller is specified as follows, Given the actual and ideal distributions defined in (12)(13)(14), the optimal randomised controller of the considered class of linear and Gaussian stochastic systems with multiple state and control input delays can be calculated following Theorem 2. This leads to the randomised control solution specified by the following theorem, Theorem 3 By substituting the ideal distribution of the system dynamics (13), the ideal distribution of the controller (14), and the actual distributions of the system dynamics (12) into Eqs. (8)(9)(10), the optimal randomised controller c(u t |x t−1 , x t−h 1 , . . . , x t−h N 1 , u t−L 1 , . . . , u t−L N 2 ) which minimizes the optimal cost-to-go function (6)  and ω t−1 is constant that does not depend on x t−1 or u t ,  (18) has an extra linear term 0.5P t−1 x t−1 . This is the consequence of the presence of the multiple lagged system state and control input. Besides, the solutions of the Riccati equation as well as the additional linear term in the mean of the optimised control input are dependent upon the delayed and future state and control input.
As has been seen from Eqs. (17)(18)(19)(20)(21), both of the solutions of the optimal cost-to-go function and the optimal randomised controller require knowledge of future state and control input values, thus bringing challenges to solve the problem analytically. To address this issue, the successive approximation approach (SAA) introduced in 35 , will be applied in this paper to obtain the numerical solution of the MTDFPC method. The numerical solution will be discussed in the next section.

Numerical solution to the MTDFPC using SAA
In this section, the SAA will be implemented to obtain the numerical solution of the MTDFPC problem. Proposed in 35 , this approach is developed by iteratively solving a sequence of the corresponding non-homogeneous linear equations in the LQ control problem where each sequence is worked out as a standard numerical problem. Following this approach, the future terms in the optimal control law and corresponding optimality equations can be obtained from previous iterations, thus overcoming the requirement of predicting these future values. For more details about the SAA, the readers are referred 35 . The procedure of a slight variation of the SAA for obtaining the approximate numerical solution of the optimal randomised controller is given in Algorithm 1. As can be seen from this algorithm, the optimisation problem of the randomised controller needs to be done through a number of iterations, K where in each iteration the sequence of randomised control inputs that optimises the control objective is obtained from time zero to the final time. Once the first iteration is completed the required future state and control input values can be obtained and used in the next iteration. This process continues until a convergence is achieved. In addition, ε t is Gaussian noise with the following distribution ε t ∼ N(0, 0.02I 3×3 ) , where I 3×3 is the identity matrix of size 3. This example was used in 33 to demonstrate their theoretical development of LQRs for systems with multiple input and state delays.
To validate the performance of the MTDFPC derived in this paper, the results are compared to the results obtained from the traditional FPD. The simulation results are given in Figs. 1, 2 and 3, where the blue solid line represents the state responses controlled by MTDFPC, the red dashed line is the state controlled by traditional FPD, and the yellow dotted line stands for their corresponding state references. From these figures, it can be seen that compared with the system state controlled by traditional FPD ,the state controlled by MTDFPC can always track their corresponding reference states, even with the presence of the noise and the multiple state and input delays. On the contrary, the state controlled by the traditional FPD shows large tracking errors. Based   Electric heater system. To demonstrate the effectiveness of the proposed MTDFPC algorithm on real world systems, this section discusses the results of the implementation of the proposed algorithm to an industrial electric heater model which was used in 36,37 . The system structure is given in Fig. 4. From Fig. 4, it can be seen that the heater involves five heating zones, each equipped with an electric heater and their own thermocouple to measure its temperature profile. The system state are the temperatures in each zone which are donated by x 1 , ...,x 5 while control inputs are the electrical current signals applied to each zone of the heater which are donated by ũ 1 , ...,ũ 5 . The control objective is to maintain the temperature profile of the process at their pre-set operating points x 1 to x 5 . A state-delayed nominal discrete tracking-error based model for this system was obtained in 37 in the following form, where the system matrices are given by, In addition, the state and control vectors are defined as, where x 1 to x 5 are each heater's temperature as introduced and x 1 to x 5 represent their operating points that they need to follow, similarly, ũ 1 to ũ 5 stand for the control electric current in each zone and ū 1 to ū 5 are their corresponding operating points. The system state x t in Eq. (23) is the tracking error of each zone, indicating that the control objective here is to make sure the system state stays around zero. Moreover, ε t is Gaussian noise representing the uncertainties and disturbance that the system is affected by. The distribution of ε t is given as follows, where I 5×5 is the identity matrix of size 5. In this simulation study, the state delay is taken to be d = 15 while the initial value of state is taken to be x 0 = [−0.2, 0.5, 1, −0.4, 0.9] T , and x t = [0, 0, 0, 0, 0] T , t = −d + 1, ..., −1 . The SAA control loop is set to be K = 3 . As introduced earlier, the reference state are zero in this case, x r = [0, 0, 0, 0, 0] T .
Following the procedure provided in Algorithm I, the system state response are given in Figs. 5, 6, 7, 8 and 9. From these figures, we can see that despite the influence of the noise and the state delays, the designed local randomised controllers have successfully brought all the states to zero, indicating that all the heaters' temperatures are following their  www.nature.com/scientificreports/ operating points. The results illustrate that the proposed algorithm achieved a satisfactory performance for the electric heater system that involves noises and state delays.

Conclusion
In this paper, the optimal randomised control problem for stochastic discrete-time systems that are affected by multiple control input and state delays has been considered. Probabilistic state-space models are exploited to characterise the dynamics of the system and a MTDFPC control framework is developed by considering the multiple delays of the system state and control input into the derivation of the optimal randomised controller. x(1) x r (1) Figure 5. State x 1 and reference x r (1). x (2) x r (2) Figure 6. State x 2 and reference x r (2).  www.nature.com/scientificreports/ Moreover, the analytic optimal control law for a class of linear Gaussian stochastic systems is obtained and its numerical solution is evaluated using the SAA method. Finally, one numerical example and one practical example demonstrated the effectiveness of the proposed MTDFPC framework for stochastic systems that are affected by multiple delays and randomness. x(4) x r (4) Figure 8. State x 4 and reference x r (4). x(5) x (5) x r (5) Figure 9. State x 5 and reference x r (5).