OpenControl.ADP_control

Package Contents

Classes

LTI

This class present state-space LTI system.

NonLin

This class represent non-linear system by ODEs.

LTIController

This present continuous controller for LTI System

NonLinController

This present continuous controller for Non-Linear System

Logger

Real-time visualize simulation results, using Tensorboard

class OpenControl.ADP_control.LTI(A, B, C=1, D=0)

This class present state-space LTI system.

dimension

(n_state, n_input).

Type

tuple

model

{A, B, C, D, dimension}.

Type

dict

max_step

define max step for ODEs solver algorithms. Defaults to 1e-3.

Type

float, optional

algo

RK45, RK23 or DOP853 . Defaults to ‘RK45’.

Type

str, optional

t_sim

time for simualtion (start, stop). Defaults to (0,10).

Type

tuple, optional

x0

the initial state. Defaults to np.ones((n,)).

Type

1xn array, optional

sample_time

the sample time. Defaults to 1e-2.

Type

float, optional

_check_model(self)
setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)

Run this function before any simulations

Parameters
  • max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.

  • algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.

  • t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).

  • x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).

  • sample_time (float, optional) – the sample time. Defaults to 1e-2.

integrate(self, x0, u, t_span)
class OpenControl.ADP_control.NonLin(dot_x, dimension)

This class represent non-linear system by ODEs.

dot_x

the dx/dt function, return 1D array output

Type

func(t,x,u)

dimension

(n_state, n_input)

Type

tuple

max_step

define max step for ODEs solver algorithms. Defaults to 1e-3.

Type

float, optional

algo

RK45, RK23 or DOP853 . Defaults to ‘RK45’.

Type

str, optional

t_sim

time for simualtion (start, stop). Defaults to (0,10).

Type

tuple, optional

x0

the initial state. Defaults to np.ones((n,)).

Type

1xn array, optional

sample_time

the sample time. Defaults to 1e-2.

Type

float, optional

setSimulationParam(self, max_step=0.001, algo='RK45', t_sim=(0, 10), x0=None, sample_time=0.01)

Run this function before any simulations

Parameters
  • max_step (float, optional) – define max step for ODEs solver algorithms. Defaults to 1e-3.

  • algo (str, optional) – RK45, RK23 or DOP853 . Defaults to ‘RK45’.

  • t_sim (tuple, optional) – time for simualtion (start, stop). Defaults to (0,10).

  • x0 (1xn array, optional) – the initial state. Defaults to np.ones((n,)).

  • sample_time (float, optional) – the sample time. Defaults to 1e-2.

integrate(self, x0, u, t_span, t_eval=None)
class OpenControl.ADP_control.LTIController(system, log_dir='results')

This present continuous controller for LTI System

system

the object of LTI class

Type

LTI class

log_dir

the folder include all log files. Defaults to ‘results’.

Type

string, optional

logX

the object of Logger class, use for logging state signals

Type

Logger class

K0

The initial value of K matrix. Defaults to np.zeros((m,n)).

Type

mxn array, optional

Q

The Q matrix. Defaults to 1.

Type

nxn array, optional

R

The R matrix. Defaults to 1.

Type

mxm array, optional

data_eval

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type

float, optional

num_data

the number of data for each learning iteration. Defaults to 10.

Type

int, optional

explore_noise

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type

func(t), optional

logK

logger of the K matrix

Type

Logger class

logP

logger of the P matrix

Type

Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type

float, array

viz

True for visualize results on Tensorboard. Default to True

Type

boolean

step(self, x0, u, t_span)

Step respond of the system.

Parameters
  • x0 (1D array) – initial state for simulation

  • u (1D array) – the value of input within t_span

  • t_span (list) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

LQR(self, Q=None, R=None)

This function solve Riccati function with defined value function

Parameters
  • Q (nxn array optional) – the Q matrix. Defaults to 1.

  • R (mxm arary, optional) – the R matrix. Defaults to 1.

Returns

the K, P matrix

Return type

mxn array, nxn array

_isStable(self, A)
onPolicy(self, stop_thres=0.001, viz=True)

Using On-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters
  • stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.

  • viz (bool, optional) – True for logging data. Defaults to True.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_afterGainKopt(self, t_plot, x_plot, Kopt, section)
_rowGainOnPloicy(self, K, x_sample, t_sample)
setPolicyParam(self, K0=None, Q=None, R=None, data_eval=0.1, num_data=10, explore_noise=lambda t: ...)

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters
  • K0 (mxn array, optional) – The initial value of K matrix. Defaults to np.zeros((m,n)).

  • Q (nxn array, optional) – The Q matrix. Defaults to 1.

  • R (mxm array, optional) – The R matrix. Defaults to 1.

  • data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

  • num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.

  • explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Raises

ValueError – raise when the initial value of the K matrix is not admissible

Note

  • The K0 matrix must be admissible

  • data_eval must be larger than the sample_time

  • num_data >= n(n+1) + 2mn

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters
  • stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.

  • viz (bool, optional) – True for logging data. Defaults to True.

  • max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Raises

ValueError – raise when the user-defined number of data is not enough, make rank condition unsatisfied

Returns

the optimal K, P matrix

Return type

mxn array, nxn array

_policyEval(self, dxx, Ixx, Ixu)
_getRowOffPolicyMatrix(self, t_sample, x_sample)
class OpenControl.ADP_control.NonLinController(system, log_dir='results')

This present continuous controller for Non-Linear System

system

the object of nonLin class

Type

nonLin class

log_dir

the folder include all log files. Defaults to ‘results’.

Type

string, optional

logX

the object of Logger class, use for logging state signals

Type

Logger class

u0

The initial feedback control policy. Defaults to 0.

Type

func(x), optional

q_func

the function q(x). Defaults to nonLinController.default_q_func.

Type

func(x), optional

R

The R matrix. Defaults to 1.

Type

mxm array, optional

phi_func

the sequences of basis function to approximate critic, \phi_j(x). Defaults to nonLinController.default_phi_func

Type

list of func(x), optional

psi_func

the sequences of basis function to approximate actor, \psi_j(x). Defaults to nonLinController.default_psi_func

Type

list of func(x), optional

data_eval

data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

Type

float, optional

num_data

the number of data for each learning iteration. Defaults to 10.

Type

int, optional

explore_noise

The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Type

func(t), optional

logWa

logging to value of the weight of the actor

Type

Logger class

logWc

logging to value of the weight of the critic

Type

Logger class

t_plot, x_plot

use for logging, plotting simulation result

Type

float, array

viz

True for visualize results on Tensorboard. Default to True

Type

boolean

setPolicyParam(self, q_func=None, R=None, phi_func=None, psi_func=None, u0=lambda x: ..., data_eval=0.1, num_data=10, explore_noise=lambda t: ...)

Setup policy parameters for both the On (Off) policy algorithms. Initalize logger for K, P matrix

Parameters
  • q_func (func(x), optional) – the function q(x). Defaults to nonLinController.default_q_func

  • R (mxm array, optional) – The R matrix. Defaults to 1.

  • phi_func (list of func(x), optional) – the sequences of basis function to approximate critic, \phi_j(x). Defaults to nonLinController.default_phi_func

  • psi_func (list of func(x), optional) – the sequences of basis function to approximate actor, \psi_j(x). Defaults to nonLinController.default_psi_func

  • u0 (func(x), optional) – The initial feedback control policy. Defaults to 0.

  • data_eval (float, optional) – data_eval x num_data = time interval for each policy updation. Defaults to 0.1.

  • num_data (int, optional) – the number of data for each learning iteration. Defaults to 10.

  • explore_noise (func(t), optional) – The exploration noise within the learning stage. Defaults to lambda t:2*np.sin(100*t).

Note

  • u0 must be admissible controller

  • the squences of basis functions \phi_j(x), \psi_j(x) should be in the form of linearly independent smooth

  • data_eval must be larger than the sample_time

  • num_data >= n(n+1) + 2mn

step(self, dot_x, x0, t_span)

Step respond of the no-input system

Parameters
  • dot_x (func(x)) – no-input ODEs function

  • x0 (1D array) – the initial state

  • t_span (tuple) – (t_start, t_stop)

Returns

t_span, state at t_span (x_start, x_stop)

Return type

list, 2D array

feedback(self, viz=True)

Check stability of the initial control policy u0

Parameters

viz (boolean) – True for visualize results on Tensorboard. Default to True

Returns

t_plot and x_plot

Return type

list, 2D array

offPolicy(self, stop_thres=0.001, max_iter=30, viz=True)

Using Off-policy approach to find optimal adaptive feedback controller, requires only the dimension of the system

Parameters
  • stop_thres (float, optional) – threshold value to stop iteration. Defaults to 1e-3.

  • viz (boolean) – True for visualize results on Tensorboard. Default to True

  • unlearned_compare (boolean) – True to log unlearned states data, for comparision purpose.

  • max_iter (int, optional) – the maximum number of policy iterations. Defaults to 30.

Returns

the final updated weight of critic, actor neural nets.

Return type

array, array

_unlearn_controller(self, t_plot, x_plot, section)
_afterGainWopt(self, t_plot, x_plot, Waopt, section)
_policyEval(self, dphi, Iq, Iupsi, Ipsipsi)
_getRowOffPolicyMatrix(self, t_sample, x_sample)
static default_psi_func(x)

The default sequences of basis functions to approximate actor

Parameters

x (1xn array) – the state vector

Returns

the polynomial basis function. If x=[x_1,x_2]^T then \psi(x) = [x_1, x_2, x_1^3, x_1^2x_2, x_1x_2^2, x_2^3]^T

Return type

list func(x)

static default_phi_func(x)

The default sequences of basis functions to approximate critic

Parameters

x (1xn array) – the state vector

Returns

the polynomial basis function. If x=[x_1,x_2]^T then \phi(x) = [x_1^2, x_1x_2, x_2^2, x_1^4, x_1^2x_2^2, x_2^4]^T

Return type

list func(x)

static default_q_func(x)

The default function of the q(x) function

Parameters

x (1D array) – the state vector

Returns

x^Tx

Return type

float

class OpenControl.ADP_control.Logger(log_dir='results', filename_suffix='')

Bases: object

Real-time visualize simulation results, using Tensorboard

writer

the same as SummaryWriter

Type

class

log(self, section, signals, step)

Log the signals with the name of section in the step time

Parameters
  • section (str) – name of signals

  • signals (array) – signals

  • step (int) – only int type is acceptable. Please convert float timestep to int

end_log(self)

Call this function to end logging