Usage

Running Experiments

  1. Define the nonlinear control problem in repr_control/define_problem.py. Following items needs to be defined:

    • Dynamics

    • Reward function

    • Initial distributions

    • State and action bounds

    • Maximum rollout steps

    • Noise level

    The current file is an example of inverted pendulum.

     1"""
     2We need to define the nonlinear control problems in this file.
     3"""
     4
     5import torch
     6import numpy as np
     7########################################################################################################################
     8# 1. define problem-related constants
     9########################################################################################################################
    10state_dim = 3                       # state dimension
    11action_dim = 1                      # action dimension
    12state_range = [[-1, -1, -8],
    13               [1, 1, 8]]           # low and high. We set bound on the state to ensure stable training.
    14action_range = [[-2], [2]]          # low and high
    15max_step = 200                      # maximum rollout steps per episode
    16sigma = 0.05                          # noise standard deviation.
    17env_name = 'Pendulum'
    18assert len(action_range[0]) == len(action_range[1]) == action_dim
    19
    20########################################################################################################################
    21# 2. define dynamics model, reward function and initial distribution.
    22########################################################################################################################
    23def dynamics(state: torch.Tensor, action: torch.Tensor) -> torch.Tensor:
    24    """
    25    The dynamics. Needs to be written in pytorch to enable auto differentiation.
    26    The input and outputs should be 2D Tensors, where the first dimension should be batch size, and the second dimension 
    27    is the state. For example, the pendulum state will looks like
    28    [[cos(theta), sin(theta), dot theta],
    29     [cos(theta), sin(theta), dot theta],
    30     ...,
    31     [cos(theta), sin(theta), dot theta]
    32     ]
    33    
    34    Parameters
    35    ----------
    36    state            torch.Tensor, [batch_size, state_dim] 
    37    action           torch.Tensor, [batch_size, action_dim]
    38
    39    Returns
    40    next_state       torch.Tensor, [batch_size, state_dim]
    41    -------
    42
    43    """
    44    g = 10.0
    45    m = 1.
    46    l = 1.
    47    max_a = 2.
    48    dt = 0.05
    49    max_speed = 8
    50    cos_th, sin_th, thdot = state[:, 0], state[:, 1], state[:, 2]
    51    th = torch.atan2(sin_th, cos_th)
    52    action = torch.reshape(action, (action.shape[0],))
    53    u = torch.clip(action, -max_a, max_a)
    54    newthdot = thdot + (3. * g / (2 * l) * torch.sin(th) + 3.0 / (m * l ** 2) * u) * dt
    55    newthdot = torch.clip(newthdot, -max_speed, max_speed)
    56    newth = th + newthdot * dt
    57    next_state = torch.vstack([torch.cos(newth), torch.sin(newth), newthdot]).T
    58    assert next_state.shape == state.shape
    59    return next_state
    60
    61def rewards(state: torch.Tensor, action: torch.Tensor) -> torch.Tensor:
    62    """
    63    The reward. Needs to be written in pytorch to enable auto differentiation.
    64    
    65    Parameters
    66    ----------
    67    state            torch.Tensor, [batch_size, state_dim] 
    68    action           torch.Tensor, [batch_size, action_dim]
    69
    70    Returns
    71    rewards       torch.Tensor, [batch_size,]
    72    -------
    73
    74    """
    75    cos_th, sin_th, thdot = state[:, 0], state[:, 1], state[:, 2]
    76    th = torch.atan2(sin_th, cos_th)
    77    action = torch.reshape(action, (action.shape[0],))
    78    reward = -0.3 * (th ** 2 + 0.1 * thdot ** 2 + 0.001 * action ** 2)
    79    return reward
    80
    81def initial_distribution(batch_size: int) -> torch.Tensor:
    82    th = 2 * np.pi * torch.rand((batch_size)) - np.pi
    83    thdot = 2 * torch.rand((batch_size)) - 1
    84    return torch.vstack([torch.cos(th),
    85                         torch.sin(th),
    86                         thdot]).T
    

Advanced Usage

Define training hyperparameters

You can define training hyperparameters via adding command line arguments when running solve.py.

For example,

  • setting max training steps:

$ python solve.py --max_step 2e5

inspect the training results using tensorboard

$ # during/after training
$ tensorboard --logdir $LOG_PATH