Summary & Conclusion

  1. The convergence of infinite Horizon discrete-time (DT) Nonlinear Optimal control HJB equation (Hamilton-Jacobi – Bellman equation) is proved
  2. The method to solve optimal control :actor-critic method

a critic NN is used to approximate the value function

**an action network **is used to approximate the optimal control policy

Advantages: Complete knowledge of Systems Dynamic is not required

The nonlinear Optimal control problem of infinite Horizon discrete-time (DT) is solved

Dlinear QUADRATIC regulator is a very good solution to LQR problems.

introduction

The DT nonlinear optimal control solution depends on The DT Hamilton-Jacobi – Bellman (HJB) equation

These are all offline methods for solving the HJB equations that require full knowledge of the system dynamics.

In this paper, we provide a full rigorous proof of convergence of the value-iteration-based HDP algorithm to solve the DT HJB equation of the optimal control problem for general nonlinear DT systems.

The point is stressed that these results also hold for the special LQR case

Section II starts by introducing the nonlinear DT optimal control problem.

Section III demonstrates how to set up the HDP algorithm to solve for the nonlinear DT optimal control problem.

In Section IV, we prove the convergence of HDP value iterations to the solution of the DT HJB equation.

In Section V, we introduce two NN parametric structures to approximate the optimal value function and policy.

II. DT HJB EQUATION

(according to certain conditions) can be written as

DT HJB equation:

First order necessary condition, so find the gradient on the right-hand side

thenUpdate along the gradient

III. HDP ALGORITHM

Optimal and Autonomous Control Using Reinforcement Learning: A Survey. 1) Offline PI Algorithm

V. NN APPROXIMATION FOR VALUE AND ACTION