`OpenControl.ADP_control`¶

Submodules¶

Package Contents¶

Classes¶

`LTI`	This class present state-space LTI system.
`NonLin`	This class represent non-linear system by `ODEs`.
`LTIController`	This present continuous controller for LTI System
`NonLinController`	This present continuous controller for Non-Linear System
`Logger`	Real-time visualize simulation results, using Tensorboard

class OpenControl.ADP_control.LTI(A, B, C=1, D=0)¶

This class present state-space LTI system.

dimension¶

(n_state, n_input).

Type: tuple

model¶

{A, B, C, D, dimension}.

Type: dict

max_step¶

define max step for ODEs solver algorithms. Defaults to 1e-3.

Type: float, optional

algo¶

RK45, RK23 or DOP853 . Defaults to ‘RK45’.

Type: str, optional

t_sim¶

time for simualtion (start, stop). Defaults to (0,10).

Type: tuple, optional

x0¶

the initial state. Defaults to np.ones((n,)).

Type: 1xn array, optional

sample_time¶

the sample time. Defaults to 1e-2.

Type: float, optional

_check_model(self)¶

setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)¶

Run this function before any simulations

Parameters

max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.
algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.
t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).
x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).
sample_time (float, optional) – the sample time. Defaults to 1e-2.

integrate(self, x0, u, t_span)¶

class OpenControl.ADP_control.NonLin(dot_x, dimension)¶

This class represent non-linear system by ODEs.

dot_x¶

the dx/dt function, return 1D array output

Type: func(t,x,u)

dimension¶

(n_state, n_input)

Type: tuple

max_step¶

define max step for ODEs solver algorithms. Defaults to 1e-3.

Type: float, optional

algo¶

RK45, RK23 or DOP853 . Defaults to ‘RK45’.

Type: str, optional

t_sim¶

time for simualtion (start, stop). Defaults to (0,10).

Type: tuple, optional

x0¶

the initial state. Defaults to np.ones((n,)).

Type: 1xn array, optional

sample_time¶

the sample time. Defaults to 1e-2.

Type: float, optional

setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)¶

Run this function before any simulations

Parameters

max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.
algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.
t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).
x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).
sample_time (float, optional) – the sample time. Defaults to 1e-2.

integrate(self, x0, u, t_span, t_eval=None)¶

class OpenControl.ADP_control.LTIController(system, log_dir='results')¶

This present continuous controller for LTI System

system¶

the object of LTI class

Type: LTI class

log_dir¶

the folder include all log files. Defaults to ‘results’.

Type: string, optional

logX¶

the object of Logger class, use for logging state signals

Type: Logger class

K0¶

The initial value of K matrix. Defaults to np.zeros((m,n)).

Type: mxn array, optional

Q¶

The Q matrix. Defaults to 1.

Type: nxn array, optional

R¶

The R matrix. Defaults to 1.

Type: mxm array, optional

data_eval¶

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type: float, optional

num_data¶

the number of data for each learning iteration. Defaults to 10.

Type: int, optional

explore_noise¶

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type: func(t), optional

logK¶

logger of the K matrix

Type: Logger class

logP¶

logger of the P matrix

Type: Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type: float, array

viz¶

True for visualize results on Tensorboard. Default to True

Type: boolean

step(self, x0, u, t_span)¶

Step respond of the system.

Parameters

x0 (1D array) – initial state for simulation
u (1D array) – the value of input within t_span
t_span (list) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

LQR(self, Q=None, R=None)¶

This function solve Riccati function with defined value function

Parameters

Q (nxn array optional) – the Q matrix. Defaults to 1.
R (mxm arary, optional) – the R matrix. Defaults to 1.

Returns

the K, P matrix

Return type

mxn array, nxn array

_isStable(self, A)¶

onPolicy(self, stop_thres=0.001, viz=True)¶

Using On-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_afterGainKopt(self, t_plot, x_plot, Kopt, section)¶

_rowGainOnPloicy(self, K, x_sample, t_sample)¶

setPolicyParam(self, K0=None, Q=None, R=None, data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters

K0 (mxn array, optional) – The initial value of K matrix. Defaults to np.zeros((m,n)).
Q (nxn array, optional) – The Q matrix. Defaults to 1.
R (mxm array, optional) – The R matrix. Defaults to 1.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Raises

ValueError – raise when the initial value of the K matrix is not admissible

Note

The K0 matrix must be admissible
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (bool, optional) – True for logging data. Defaults to True.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_policyEval(self, dxx, Ixx, Ixu)¶

_getRowOffPolicyMatrix(self, t_sample, x_sample)¶

class OpenControl.ADP_control.NonLinController(system, log_dir='results')¶

This present continuous controller for Non-Linear System

system¶

the object of nonLin class

Type: nonLin class

log_dir¶

the folder include all log files. Defaults to ‘results’.

Type: string, optional

logX¶

the object of Logger class, use for logging state signals

Type: Logger class

u0¶

The initial feedback control policy. Defaults to 0.

Type: func(x), optional

q_func¶

the function $q(x)$ . Defaults to nonLinController.default_q_func.

Type: func(x), optional

R¶

The R matrix. Defaults to 1.

Type: mxm array, optional

phi_func¶

the sequences of basis function to approximate critic, $\phi_j(x)$ . Defaults to nonLinController.default_phi_func

Type: list of func(x), optional

psi_func¶

the sequences of basis function to approximate actor, $\psi_j(x)$ . Defaults to nonLinController.default_psi_func

Type: list of func(x), optional

data_eval¶

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type: float, optional

num_data¶

the number of data for each learning iteration. Defaults to 10.

Type: int, optional

explore_noise¶

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type: func(t), optional

logWa¶

logging to value of the weight of the actor

Type: Logger class

logWc¶

logging to value of the weight of the critic

Type: Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type: float, array

viz¶

True for visualize results on Tensorboard. Default to True

Type: boolean

setPolicyParam(self, q_func=None, R=None, phi_func=None, psi_func=None, u0=lambda x: ..., data_eval=0.1, num_data=10, explore_noise=lambda t: ...)¶

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters

q_func (func(x), optional) – the function $q(x)$ . Defaults to nonLinController.default_q_func
R (mxm array, optional) – The R matrix. Defaults to 1.
phi_func (list of func(x), optional) – the sequences of basis function to approximate critic, $\phi_j(x)$ . Defaults to nonLinController.default_phi_func
psi_func (list of func(x), optional) – the sequences of basis function to approximate actor, $\psi_j(x)$ . Defaults to nonLinController.default_psi_func
u0 (func(x), optional) – The initial feedback control policy. Defaults to 0.
data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.
num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.
explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Note

u0 must be admissible controller
the squences of basis functions $\phi_j(x), \psi_j(x)$ should be in the form of linearly independent smooth
data_eval must be larger than the sample_time
num_data >= n(n+1) + 2mn

step(self, dot_x, x0, t_span)¶

Step respond of the no-input system

Parameters

dot_x (func(x)) – no-input ODEs function
x0 (1D array) – the initial state
t_span (tuple) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

feedback(self, viz=True)¶

Check stability of the initial control policy u0

Parameters: viz (boolean) – True for visualize results on Tensorboard. Default to True
Returns: t_plot and x_plot
Return type: list, 2D array

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)¶

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters

stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.
viz (boolean) – True for visualize results on Tensorboard. Default to True
unlearned_compare (boolean) – True to log unlearned states data, for comparision purpose.
max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Returns

the final updated weight of critic, actor neural nets.

Return type

array, array

_unlearn_controller(self, t_plot, x_plot, section)¶

_afterGainWopt(self, t_plot, x_plot, Waopt, section)¶

_policyEval(self, dphi, Iq, Iupsi, Ipsipsi)¶

_getRowOffPolicyMatrix(self, t_sample, x_sample)¶

static default_psi_func(x)¶

The default sequences of basis functions to approximate actor

Parameters: x (1xn array) – the state vector
Returns: the polynomial basis function. If $x=[x_1,x_2]^T$ then $\psi(x) = [x_1, x_2, x_1^3, x_1^2x_2, x_1x_2^2, x_2^3]^T$
Return type: list func(x)

static default_phi_func(x)¶

The default sequences of basis functions to approximate critic

Parameters: x (1xn array) – the state vector
Returns: the polynomial basis function. If $x=[x_1,x_2]^T$ then $\phi(x) = [x_1^2, x_1x_2, x_2^2, x_1^4, x_1^2x_2^2, x_2^4]^T$
Return type: list func(x)

static default_q_func(x)¶

The default function of the q(x) function

Parameters: x (1D array) – the state vector
Returns: $x^Tx$
Return type: float

class OpenControl.ADP_control.Logger(log_dir='results', filename_suffix='')¶

Bases: object

Real-time visualize simulation results, using Tensorboard

writer¶

the same as SummaryWriter

Type: class

log(self, section, signals, step)¶

Log the signals with the name of section in the step time

Parameters

section (str) – name of signals
signals (array) – signals
step (int) – only int type is acceptable. Please convert float timestep to int

end_log(self)¶: Call this function to end logging

OpenControl.ADP_control¶

Submodules¶

Package Contents¶

Classes¶

`OpenControl.ADP_control`¶