Derivation of Dirac equation from the stochastic optimal control principles of quantum mechanics

In this paper, we present a stochastic approach to relativistic quantum mechanics. We formulate the three fundamental principles of this theory and derive the Dirac equations based on them. This approach enables us to bring more insight into the nature of Dirac’s spinors. Furthermore, we provide a physical interpretation of the stochastic optimal control theory of quantum mechanics.


Introduction
In this paper, we will derive the stochastic Hamilton-Jacobi-Bellman (HJB) equation.The derivation presented here is inspired by the following papers: [1,2,3].

Derivation of the stochastic HJB equation
The stochastic equation of motion of the particle is: where x µ are the spacetime coordinates x of the particle, and u µ are the components of the four-velocity u.
The action is postulated as minimum of the expected value of stochastic action: where L(x(s), u(s), s) is the Lagrangian of the test particle, which is a function of the control policy u(s) and four-coordinates x(s) at proper time s .The subscript x i on the expectation value means that the expectation is over all stochastic trajectories that start at x i .The task of optimal control theory [4] is to find the control u(s), τ i < s < τ f , denoted as u(τ i → τ f ), that minimizes the expected value of the action S(x i , u(τ i → τ f )).
We introduce the optimal cost-to-go function for any intermediate proper time τ , where τ i < τ < τ f : By definition, the action S(x i , u(τ i → τ f )) is equal to the cost-to-go function J(τ i , x τi ) at the initial proper time and spacetime coordinate: We can rewrite recursive formula for J(τ, x τ ) for any intermediate time τ ′ , where τ < τ ′ < τ f : (2.5) In above equation we split the minimization over two intervals.These are not independent, because the second minimization is conditioned on the starting value x τ ′ , which depends on the outcome of the first minimization.
If τ ′ is a small increment of τ , τ ′ = τ + dτ then: We must take a Taylor expansion of J in dτ and dx.However, since dx 2 = σ 2 dτ is of order dτ , we must expand up to order dx 2 : (2.7)Here N (x τ +dτ |x τ , σdτ ) is the conditional probability starting from state x τ to end up in state x τ +dτ .The integration is over the entire spacetime.In the above equation we also use the notation ∂ µ J(τ, x) as partial derivative with respect to x µ .
We can calculate dx µ using equations (2.1): From where: In similar way we calculate dx ν dx µ : (2.10) From where we derive: After substituting the above equations in (2.7) we derive the stochastic HJB equation: It is clear the from its definition of J(τ, x τ ) that we have the following boundary condition: The optimal control at the current x, τ , is given by: u(x, τ ) = arg min u (L(τ, x, u) + u µ ∂ µ J(τ, x)) . (2.14)

Bibliography
[1] Hilbert J. Kappen.Path integrals and symmetry breaking for optimal control theory.