OpenControl.ADP_control.controller
¶
Module Contents¶
Classes¶
This present continuous controller for LTI System |
|
This present continuous controller for Non-Linear System |
- class OpenControl.ADP_control.controller.LTIController(system, log_dir='results')¶
This present continuous controller for LTI System
- system¶
the object of LTI class
- Type
LTI class
- log_dir¶
the folder include all log files. Defaults to ‘results’.
- Type
string, optional
- logX¶
the object of Logger class, use for logging state signals
- Type
Logger class
- K0¶
The initial value of K matrix. Defaults to np.zeros((m,n)).
- Type
mxn array, optional
- Q¶
The Q matrix. Defaults to 1.
- Type
nxn array, optional
- R¶
The R matrix. Defaults to 1.
- Type
mxm array, optional
- data_eval¶
data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
- Type
float, optional
- num_data¶
the number of data for each learning iteration. Defaults to 10.
- Type
int, optional
- explore_noise¶
The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Type
func(t), optional
- logK¶
logger of the K matrix
- Type
Logger class
- logP¶
logger of the P matrix
- Type
Logger class
- t_plot, x_plot
use for logging, plotting simulation result
- Type
float, array
- viz¶
True for visualize results on
Tensorboard
. Default to True- Type
boolean
- step(self, x0, u, t_span)¶
Step respond of the system.
- Parameters
x0 (1D array) – initial state for simulation
u (1D array) – the value of input within t_span
t_span (list) – (t_start, t_stop)
- Returns
t_span, state at t_span (x_start, x_stop)
- Return type
list, 2D array
- LQR(self, Q=None, R=None)¶
This function solve Riccati function with defined value function
- Parameters
Q (nxn array optional) – the Q matrix. Defaults to 1.
R (mxm arary, optional) – the R matrix. Defaults to 1.
- Returns
the K, P matrix
- Return type
mxn array, nxn array
- _isStable(self, A)¶
- onPolicy(self, stop_thres=0.001, viz=True)¶
Using On-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
- Raises
ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied
- Returns
the optimal K, P matrix
- Return type
mxn array, nxn array
- _afterGainKopt(self, t_plot, x_plot, Kopt, section)¶
- _rowGainOnPloicy(self, K, x_sample, t_sample)¶
- setPolicyParam(self, K0=None, Q=None, R=None, data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶
Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix
- Parameters
K0 (mxn array, optional) – The initial value of K matrix. Defaults to np.zeros((m,n)).
Q (nxn array, optional) – The Q matrix. Defaults to 1.
R (mxm array, optional) – The R matrix. Defaults to 1.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Raises
ValueError – raise when the initial value of the K matrix is not admissible
Note
The K0 matrix must be admissible
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn
- offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶
Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.
- Raises
ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied
- Returns
the optimal K, P matrix
- Return type
mxn array, nxn array
- _policyEval(self, dxx, Ixx, Ixu)¶
- _getRowOffPolicyMatrix(self, t_sample, x_sample)¶
- class OpenControl.ADP_control.controller.NonLinController(system, log_dir='results')¶
This present continuous controller for Non-Linear System
- system¶
the object of nonLin class
- Type
nonLin class
- log_dir¶
the folder include all log files. Defaults to ‘results’.
- Type
string, optional
- logX¶
the object of Logger class, use for logging state signals
- Type
Logger class
- u0¶
The initial feedback control policy. Defaults to 0.
- Type
func(x), optional
- q_func¶
the function . Defaults to nonLinController.default_q_func.
- Type
func(x), optional
- R¶
The R matrix. Defaults to 1.
- Type
mxm array, optional
- phi_func¶
the sequences of basis function to approximate critic, . Defaults to nonLinController.default_phi_func
- Type
list of func(x), optional
- psi_func¶
the sequences of basis function to approximate actor, . Defaults to nonLinController.default_psi_func
- Type
list of func(x), optional
- data_eval¶
data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
- Type
float, optional
- num_data¶
the number of data for each learning iteration. Defaults to 10.
- Type
int, optional
- explore_noise¶
The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Type
func(t), optional
- logWa¶
logging to value of the weight of the actor
- Type
Logger class
- logWc¶
logging to value of the weight of the critic
- Type
Logger class
- t_plot, x_plot
use for logging, plotting simulation result
- Type
float, array
- viz¶
True for visualize results on
Tensorboard
. Default to True- Type
boolean
- setPolicyParam(self, q_func=None, R=None, phi_func=None, psi_func=None, u0=lambda x: ..., data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶
Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix
- Parameters
q_func (func(x), optional) – the function . Defaults to nonLinController.default_q_func
R (mxm array, optional) – The R matrix. Defaults to 1.
phi_func (list of func(x), optional) – the sequences of basis function to approximate critic, . Defaults to nonLinController.default_phi_func
psi_func (list of func(x), optional) – the sequences of basis function to approximate actor, . Defaults to nonLinController.default_psi_func
u0 (func(x), optional) – The initial feedback control policy. Defaults to 0.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
Note
u0 must be admissible controller
the squences of basis functions should be in the form of linearly independent smooth
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn
- step(self, dot_x, x0, t_span)¶
Step respond of the no-input system
- Parameters
dot_x (func(x)) – no-input ODEs function
x0 (1D array) – the initial state
t_span (tuple) – (t_start, t_stop)
- Returns
t_span, state at t_span (x_start, x_stop)
- Return type
list, 2D array
- feedback(self, viz=True)¶
Check stability of the initial control policy u0
- Parameters
viz (boolean) – True for visualize results on
Tensorboard
. Default to True- Returns
t_plot and x_plot
- Return type
list, 2D array
- offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶
Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (boolean) – True for visualize results on
Tensorboard
. Default to Trueunlearned_compare (boolean) – True to log unlearned states data, for comparision purpose.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.
- Returns
the final updated weight of critic, actor neural nets.
- Return type
array, array
- _unlearn_controller(self, t_plot, x_plot, section)¶
- _afterGainWopt(self, t_plot, x_plot, Waopt, section)¶
- _policyEval(self, dphi, Iq, Iupsi, Ipsipsi)¶
- _getRowOffPolicyMatrix(self, t_sample, x_sample)¶
- static default_psi_func(x)¶
The default sequences of basis functions to approximate actor
- Parameters
x (1xn array) – the state vector
- Returns
the polynomial basis function. If then
- Return type
list func(x)
- static default_phi_func(x)¶
The default sequences of basis functions to approximate critic
- Parameters
x (1xn array) – the state vector
- Returns
the polynomial basis function. If then
- Return type
list func(x)
- static default_q_func(x)¶
The default function of the q(x) function
- Parameters
x (1D array) – the state vector
- Returns
- Return type
float