`OpenControl.ADP_control.controller`¶

Module Contents¶

Classes¶

`LTIController`	This present continuous controller for LTI System
`NonLinController`	This present continuous controller for Non-Linear System

class OpenControl.ADP_control.controller.LTIController(system, log_dir='results')¶

This present continuous controller for LTI System

system¶

the object of LTI class

Type: LTI class

log_dir¶

the folder include all log files. Defaults to ‘results’.

Type: string, optional

logX¶

the object of Logger class, use for logging state signals

Type: Logger class

K0¶

The initial value of K matrix. Defaults to np.zeros((m,n)).

Type: mxn array, optional

Q¶

The Q matrix. Defaults to 1.

Type: nxn array, optional

R¶

The R matrix. Defaults to 1.

Type: mxm array, optional

data_eval¶

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type: float, optional

num_data¶

the number of data for each learning iteration. Defaults to 10.

Type: int, optional

explore_noise¶

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type: func(t), optional

logK¶

logger of the K matrix

Type: Logger class

logP¶

logger of the P matrix

Type: Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type: float, array

viz¶

True for visualize results on Tensorboard. Default to True

Type: boolean

step(self, x0, u, t_span)¶

Step respond of the system.

Parameters

x0 (1D array) – initial state for simulation
u (1D array) – the value of input within t_span
t_span (list) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

LQR(self, Q=None, R=None)¶

This function solve Riccati function with defined value function

Parameters

Q (nxn array optional) – the Q matrix. Defaults to 1.
R (mxm arary, optional) – the R matrix. Defaults to 1.

Returns

the K, P matrix

Return type

mxn array, nxn array

_isStable(self, A)¶

onPolicy(self, stop_thres=0.001, viz=True)¶

Using On-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_afterGainKopt(self, t_plot, x_plot, Kopt, section)¶

_rowGainOnPloicy(self, K, x_sample, t_sample)¶

setPolicyParam(self, K0=None, Q=None, R=None, data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters

K0 (mxn array, optional) – The initial value of K matrix. Defaults to np.zeros((m,n)).
Q (nxn array, optional) – The Q matrix. Defaults to 1.
R (mxm array, optional) – The R matrix. Defaults to 1.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Raises

ValueError – raise when the initial value of the K matrix is not admissible

Note

The K0 matrix must be admissible
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_policyEval(self, dxx, Ixx, Ixu)¶

_getRowOffPolicyMatrix(self, t_sample, x_sample)¶

class OpenControl.ADP_control.controller.NonLinController(system, log_dir='results')¶

This present continuous controller for Non-Linear System

system¶

the object of nonLin class

Type: nonLin class

log_dir¶

the folder include all log files. Defaults to ‘results’.

Type: string, optional

logX¶

the object of Logger class, use for logging state signals

Type: Logger class

u0¶

The initial feedback control policy. Defaults to 0.

Type: func(x), optional

q_func¶

the function $q(x)$ . Defaults to nonLinController.default_q_func.

Type: func(x), optional

R¶

The R matrix. Defaults to 1.

Type: mxm array, optional

phi_func¶

the sequences of basis function to approximate critic, $\phi_j(x)$ . Defaults to nonLinController.default_phi_func

Type: list of func(x), optional

psi_func¶

the sequences of basis function to approximate actor, $\psi_j(x)$ . Defaults to nonLinController.default_psi_func

Type: list of func(x), optional

data_eval¶

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type: float, optional

num_data¶

the number of data for each learning iteration. Defaults to 10.

Type: int, optional

explore_noise¶

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type: func(t), optional

logWa¶

logging to value of the weight of the actor

Type: Logger class

logWc¶

logging to value of the weight of the critic

Type: Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type: float, array

viz¶

True for visualize results on Tensorboard. Default to True

Type: boolean

setPolicyParam(self, q_func=None, R=None, phi_func=None, psi_func=None, u0=lambda x: ..., data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters

q_func (func(x), optional) – the function $q(x)$ . Defaults to nonLinController.default_q_func
R (mxm array, optional) – The R matrix. Defaults to 1.
phi_func (list of func(x), optional) – the sequences of basis function to approximate critic, $\phi_j(x)$ . Defaults to nonLinController.default_phi_func
psi_func (list of func(x), optional) – the sequences of basis function to approximate actor, $\psi_j(x)$ . Defaults to nonLinController.default_psi_func
u0 (func(x), optional) – The initial feedback control policy. Defaults to 0.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Note

u0 must be admissible controller
the squences of basis functions $\phi_j(x), \psi_j(x)$ should be in the form of linearly independent smooth
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn

step(self, dot_x, x0, t_span)¶

Step respond of the no-input system

Parameters

dot_x (func(x)) – no-input ODEs function
x0 (1D array) – the initial state
t_span (tuple) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

feedback(self, viz=True)¶

Check stability of the initial control policy u0

Parameters: viz (boolean) – True for visualize results on Tensorboard. Default to True
Returns: t_plot and x_plot
Return type: list, 2D array

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (boolean) – True for visualize results on Tensorboard. Default to True
unlearned_compare (boolean) – True to log unlearned states data, for comparision purpose.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Returns

the final updated weight of critic, actor neural nets.

Return type

array, array

_unlearn_controller(self, t_plot, x_plot, section)¶

_afterGainWopt(self, t_plot, x_plot, Waopt, section)¶

_policyEval(self, dphi, Iq, Iupsi, Ipsipsi)¶

_getRowOffPolicyMatrix(self, t_sample, x_sample)¶

static default_psi_func(x)¶

The default sequences of basis functions to approximate actor

Parameters: x (1xn array) – the state vector
Returns: the polynomial basis function. If $x=[x_1,x_2]^T$ then $\psi(x) = [x_1, x_2, x_1^3, x_1^2x_2, x_1x_2^2, x_2^3]^T$
Return type: list func(x)

static default_phi_func(x)¶

The default sequences of basis functions to approximate critic

Parameters: x (1xn array) – the state vector
Returns: the polynomial basis function. If $x=[x_1,x_2]^T$ then $\phi(x) = [x_1^2, x_1x_2, x_2^2, x_1^4, x_1^2x_2^2, x_2^4]^T$
Return type: list func(x)

static default_q_func(x)¶

The default function of the q(x) function

Parameters: x (1D array) – the state vector
Returns: $x^Tx$
Return type: float

OpenControl.ADP_control.controller¶

Module Contents¶

Classes¶

`OpenControl.ADP_control.controller`¶