OpenControl.ADP_control
¶
Submodules¶
Package Contents¶
Classes¶
This class present state-space LTI system. |
|
This class represent non-linear system by |
|
This present continuous controller for LTI System |
|
This present continuous controller for Non-Linear System |
|
Real-time visualize simulation results, using Tensorboard |
- class OpenControl.ADP_control.LTI(A, B, C=1, D=0)¶
This class present state-space LTI system.
- dimension¶
(n_state, n_input).
- Type
tuple
- model¶
{A, B, C, D, dimension}.
- Type
dict
- max_step¶
define max step for ODEs solver algorithms. Defaults to 1e-3.
- Type
float, optional
- algo¶
RK45, RK23 or DOP853 . Defaults to ‘RK45’.
- Type
str, optional
- t_sim¶
time for simualtion (start, stop). Defaults to (0,10).
- Type
tuple, optional
- x0¶
the initial state. Defaults to np.ones((n,)).
- Type
1xn array, optional
- sample_time¶
the sample time. Defaults to 1e-2.
- Type
float, optional
- _check_model(self)¶
- setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)¶
Run this function before any simulations
- Parameters
max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.
algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.
t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).
x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).
sample_time (float, optional) – the sample time. Defaults to 1e-2.
- integrate(self, x0, u, t_span)¶
- class OpenControl.ADP_control.NonLin(dot_x, dimension)¶
This class represent non-linear system by
ODEs
.- dot_x¶
the dx/dt function, return 1D array output
- Type
func(t,x,u)
- dimension¶
(n_state, n_input)
- Type
tuple
- max_step¶
define max step for ODEs solver algorithms. Defaults to 1e-3.
- Type
float, optional
- algo¶
RK45, RK23 or DOP853 . Defaults to ‘RK45’.
- Type
str, optional
- t_sim¶
time for simualtion (start, stop). Defaults to (0,10).
- Type
tuple, optional
- x0¶
the initial state. Defaults to np.ones((n,)).
- Type
1xn array, optional
- sample_time¶
the sample time. Defaults to 1e-2.
- Type
float, optional
- setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)¶
Run this function before any simulations
- Parameters
max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.
algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.
t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).
x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).
sample_time (float, optional) – the sample time. Defaults to 1e-2.
- integrate(self, x0, u, t_span, t_eval=None)¶
- class OpenControl.ADP_control.LTIController(system, log_dir='results')¶
This present continuous controller for LTI System
- system¶
the object of LTI class
- Type
LTI class
- log_dir¶
the folder include all log files. Defaults to ‘results’.
- Type
string, optional
- logX¶
the object of Logger class, use for logging state signals
- Type
Logger class
- K0¶
The initial value of K matrix. Defaults to np.zeros((m,n)).
- Type
mxn array, optional
- Q¶
The Q matrix. Defaults to 1.
- Type
nxn array, optional
- R¶
The R matrix. Defaults to 1.
- Type
mxm array, optional
- data_eval¶
data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
- Type
float, optional
- num_data¶
the number of data for each learning iteration. Defaults to 10.
- Type
int, optional
- explore_noise¶
The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Type
func(t), optional
- logK¶
logger of the K matrix
- Type
Logger class
- logP¶
logger of the P matrix
- Type
Logger class
- t_plot, x_plot
use for logging, plotting simulation result
- Type
float, array
- viz¶
True for visualize results on
Tensorboard
. Default to True- Type
boolean
- step(self, x0, u, t_span)¶
Step respond of the system.
- Parameters
x0 (1D array) – initial state for simulation
u (1D array) – the value of input within t_span
t_span (list) – (t_start, t_stop)
- Returns
t_span, state at t_span (x_start, x_stop)
- Return type
list, 2D array
- LQR(self, Q=None, R=None)¶
This function solve Riccati function with defined value function
- Parameters
Q (nxn array optional) – the Q matrix. Defaults to 1.
R (mxm arary, optional) – the R matrix. Defaults to 1.
- Returns
the K, P matrix
- Return type
mxn array, nxn array
- _isStable(self, A)¶
- onPolicy(self, stop_thres=0.001, viz=True)¶
Using On-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
- Raises
ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied
- Returns
the optimal K, P matrix
- Return type
mxn array, nxn array
- _afterGainKopt(self, t_plot, x_plot, Kopt, section)¶
- _rowGainOnPloicy(self, K, x_sample, t_sample)¶
- setPolicyParam(self, K0=None, Q=None, R=None, data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶
Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix
- Parameters
K0 (mxn array, optional) – The initial value of K matrix. Defaults to np.zeros((m,n)).
Q (nxn array, optional) – The Q matrix. Defaults to 1.
R (mxm array, optional) – The R matrix. Defaults to 1.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Raises
ValueError – raise when the initial value of the K matrix is not admissible
Note
The K0 matrix must be admissible
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn
- offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶
Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.
- Raises
ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied
- Returns
the optimal K, P matrix
- Return type
mxn array, nxn array
- _policyEval(self, dxx, Ixx, Ixu)¶
- _getRowOffPolicyMatrix(self, t_sample, x_sample)¶
- class OpenControl.ADP_control.NonLinController(system, log_dir='results')¶
This present continuous controller for Non-Linear System
- system¶
the object of nonLin class
- Type
nonLin class
- log_dir¶
the folder include all log files. Defaults to ‘results’.
- Type
string, optional
- logX¶
the object of Logger class, use for logging state signals
- Type
Logger class
- u0¶
The initial feedback control policy. Defaults to 0.
- Type
func(x), optional
- q_func¶
the function . Defaults to nonLinController.default_q_func.
- Type
func(x), optional
- R¶
The R matrix. Defaults to 1.
- Type
mxm array, optional
- phi_func¶
the sequences of basis function to approximate critic, . Defaults to nonLinController.default_phi_func
- Type
list of func(x), optional
- psi_func¶
the sequences of basis function to approximate actor, . Defaults to nonLinController.default_psi_func
- Type
list of func(x), optional
- data_eval¶
data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
- Type
float, optional
- num_data¶
the number of data for each learning iteration. Defaults to 10.
- Type
int, optional
- explore_noise¶
The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
- Type
func(t), optional
- logWa¶
logging to value of the weight of the actor
- Type
Logger class
- logWc¶
logging to value of the weight of the critic
- Type
Logger class
- t_plot, x_plot
use for logging, plotting simulation result
- Type
float, array
- viz¶
True for visualize results on
Tensorboard
. Default to True- Type
boolean
- setPolicyParam(self, q_func=None, R=None, phi_func=None, psi_func=None, u0=lambda x: ..., data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶
Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix
- Parameters
q_func (func(x), optional) – the function . Defaults to nonLinController.default_q_func
R (mxm array, optional) – The R matrix. Defaults to 1.
phi_func (list of func(x), optional) – the sequences of basis function to approximate critic, . Defaults to nonLinController.default_phi_func
psi_func (list of func(x), optional) – the sequences of basis function to approximate actor, . Defaults to nonLinController.default_psi_func
u0 (func(x), optional) – The initial feedback control policy. Defaults to 0.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).
Note
u0 must be admissible controller
the squences of basis functions should be in the form of linearly independent smooth
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn
- step(self, dot_x, x0, t_span)¶
Step respond of the no-input system
- Parameters
dot_x (func(x)) – no-input ODEs function
x0 (1D array) – the initial state
t_span (tuple) – (t_start, t_stop)
- Returns
t_span, state at t_span (x_start, x_stop)
- Return type
list, 2D array
- feedback(self, viz=True)¶
Check stability of the initial control policy u0
- Parameters
viz (boolean) – True for visualize results on
Tensorboard
. Default to True- Returns
t_plot and x_plot
- Return type
list, 2D array
- offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶
Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system
- Parameters
stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (boolean) – True for visualize results on
Tensorboard
. Default to Trueunlearned_compare (boolean) – True to log unlearned states data, for comparision purpose.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.
- Returns
the final updated weight of critic, actor neural nets.
- Return type
array, array
- _unlearn_controller(self, t_plot, x_plot, section)¶
- _afterGainWopt(self, t_plot, x_plot, Waopt, section)¶
- _policyEval(self, dphi, Iq, Iupsi, Ipsipsi)¶
- _getRowOffPolicyMatrix(self, t_sample, x_sample)¶
- static default_psi_func(x)¶
The default sequences of basis functions to approximate actor
- Parameters
x (1xn array) – the state vector
- Returns
the polynomial basis function. If then
- Return type
list func(x)
- static default_phi_func(x)¶
The default sequences of basis functions to approximate critic
- Parameters
x (1xn array) – the state vector
- Returns
the polynomial basis function. If then
- Return type
list func(x)
- static default_q_func(x)¶
The default function of the q(x) function
- Parameters
x (1D array) – the state vector
- Returns
- Return type
float
- class OpenControl.ADP_control.Logger(log_dir='results', filename_suffix='')¶
Bases:
object
Real-time visualize simulation results, using Tensorboard
- writer¶
the same as SummaryWriter
- Type
class
- log(self, section, signals, step)¶
Log the signals with the name of section in the step time
- Parameters
section (str) – name of signals
signals (array) – signals
step (int) – only int type is acceptable. Please convert float timestep to int
- end_log(self)¶
Call this function to end logging