Non-Linear ADP Controller¶
The below algorithms is given in the book Robust Adaptive Dynamic Programming.
Problem statements¶
Given the nonlinear system equation (2), design a control policy that minimize the following cost function:
(1)¶
where with
a positive definite function, and
is symmetric and positive definite for all
If there exists a feedback control policy that globally asymtotically stabilizes the system (2) at the origin and the associated cost as defined in (1) and there exists a continous differentiable function
such that the Hamilton-Jacobi-Bellman (HJB) equation holds
then the control policy
(2)¶
globally asymptotically stabilizes (2) at , and
is also the optimal control policy
Off Policy Learning¶
Consider the system with the folowing control policy
(3)¶
where u0 is the initial admissible control policy and is the
exploration noise then rewrite it as
(4)¶
where .
Take the time derivative of along (4) and integrate the result within interval
to obtain:
(5)¶
Because the formula (2) requires the knowledge of the dynamic of the system, it makes practical application limited. One may use neural networks to approximate the control policy (the
actor)and the unknown cost function (the
critic).
where and
are two sequences of linearly independent smooth basis function,
is the weight of the neural networks.
Replacing them to the (5) and transform the result into the matrix, we have
(6)¶
where and the same for
and
Finally, one can realize that the (6) is actually a fixpoint equation in the form
Note
the initial controller
u0must beadmissiblecontrollerthe squences of basis functions
should be in the form of linearly independent smooth. Default basis functions is the
polynomial functions, seeOpenControl.ADP_control.NonLinController.default_psi_func()andOpenControl.ADP_control.NonLinController.default_phi_func()the time interval
Tfor data collection must be larger than the sample timenumber of datathe default function
is
Algorithm¶
Library Usage¶
Define a non-linear system like in System representation, then setup a simulation section by OpenControl.ADP_control.NonLinController.setPolicyParam() and perform simulation by OpenControl.ADP_control.NonLinController.offPolicy()
from OpenControl.ADP_control import NonLinController
##########define a controller##################
Ctrl = NonLinController(sys)
u0 = lambda x: 0 # the system is already globally stable
data_eval = 0.01; num_data = 80 # at leats n_phi+n_psi
explore_noise = lambda t: 0.2*np.sum(np.sin(np.array([1, 3, 7, 11, 13, 15])*t))
###############setup policy parameter############
Ctrl.setPolicyParam(data_eval=data_eval, num_data=num_data, explore_noise=explore_noise, u0=u0)
###############take simulation step##############
Wc, Wa = Ctrl.offPolicy()
then the optimal control policy is given by
uopt = lambda t,x: Wa.dot(Ctrl.psi_func(x))