Usage
Running Experiments
Define the nonlinear control problem in
repr_control/define_problem.py. Following items needs to be defined:Dynamics
Reward function
Initial distributions
State and action bounds
Maximum rollout steps
Noise level
The current file is an example of inverted pendulum.
1""" 2We need to define the nonlinear control problems in this file. 3""" 4 5import torch 6import numpy as np 7######################################################################################################################## 8# 1. define problem-related constants 9######################################################################################################################## 10state_dim = 3 # state dimension 11action_dim = 1 # action dimension 12state_range = [[-1, -1, -8], 13 [1, 1, 8]] # low and high. We set bound on the state to ensure stable training. 14action_range = [[-2], [2]] # low and high 15max_step = 200 # maximum rollout steps per episode 16sigma = 0.05 # noise standard deviation. 17env_name = 'Pendulum' 18assert len(action_range[0]) == len(action_range[1]) == action_dim 19 20######################################################################################################################## 21# 2. define dynamics model, reward function and initial distribution. 22######################################################################################################################## 23def dynamics(state: torch.Tensor, action: torch.Tensor) -> torch.Tensor: 24 """ 25 The dynamics. Needs to be written in pytorch to enable auto differentiation. 26 The input and outputs should be 2D Tensors, where the first dimension should be batch size, and the second dimension 27 is the state. For example, the pendulum state will looks like 28 [[cos(theta), sin(theta), dot theta], 29 [cos(theta), sin(theta), dot theta], 30 ..., 31 [cos(theta), sin(theta), dot theta] 32 ] 33 34 Parameters 35 ---------- 36 state torch.Tensor, [batch_size, state_dim] 37 action torch.Tensor, [batch_size, action_dim] 38 39 Returns 40 next_state torch.Tensor, [batch_size, state_dim] 41 ------- 42 43 """ 44 g = 10.0 45 m = 1. 46 l = 1. 47 max_a = 2. 48 dt = 0.05 49 max_speed = 8 50 cos_th, sin_th, thdot = state[:, 0], state[:, 1], state[:, 2] 51 th = torch.atan2(sin_th, cos_th) 52 action = torch.reshape(action, (action.shape[0],)) 53 u = torch.clip(action, -max_a, max_a) 54 newthdot = thdot + (3. * g / (2 * l) * torch.sin(th) + 3.0 / (m * l ** 2) * u) * dt 55 newthdot = torch.clip(newthdot, -max_speed, max_speed) 56 newth = th + newthdot * dt 57 next_state = torch.vstack([torch.cos(newth), torch.sin(newth), newthdot]).T 58 assert next_state.shape == state.shape 59 return next_state 60 61def rewards(state: torch.Tensor, action: torch.Tensor) -> torch.Tensor: 62 """ 63 The reward. Needs to be written in pytorch to enable auto differentiation. 64 65 Parameters 66 ---------- 67 state torch.Tensor, [batch_size, state_dim] 68 action torch.Tensor, [batch_size, action_dim] 69 70 Returns 71 rewards torch.Tensor, [batch_size,] 72 ------- 73 74 """ 75 cos_th, sin_th, thdot = state[:, 0], state[:, 1], state[:, 2] 76 th = torch.atan2(sin_th, cos_th) 77 action = torch.reshape(action, (action.shape[0],)) 78 reward = -0.3 * (th ** 2 + 0.1 * thdot ** 2 + 0.001 * action ** 2) 79 return reward 80 81def initial_distribution(batch_size: int) -> torch.Tensor: 82 th = 2 * np.pi * torch.rand((batch_size)) - np.pi 83 thdot = 2 * torch.rand((batch_size)) - 1 84 return torch.vstack([torch.cos(th), 85 torch.sin(th), 86 thdot]).T
Advanced Usage
Define training hyperparameters
You can define training hyperparameters via adding command line arguments when running solve.py.
For example,
setting max training steps:
$ python solve.py --max_step 2e5
inspect the training results using tensorboard
$ # during/after training
$ tensorboard --logdir $LOG_PATH