calculate_reinforcement_learning_step

static FunctionNamespace.calculate_reinforcement_learning_step(*, agent, environment_feedbacks, trainable=True, episode_done=False, **kwargs)

Perform a single-step of reinforcement learning.

Use this function to initialize a reinforcement learning agent, update it with observed states from the device or simulator of your choice, and generate further segments of an optimized control pulse. You can use this approach when your system is too complicated to model, when the computation of gradient is expensive or impossible, or if closed-loop optimization fails.

Parameters
  • agent (qctrl.dynamic.types.reinforcement_learning_step.Agent) – The reinforcement learning agent. Option to initialize the agent or update the agent’s state.

  • environment_feedbacks (List[qctrl.dynamic.types.reinforcement_learning_step.EnvironmentFeedback]) – The batch of feedbacks (observation and reward pairs). Each will cause an action to be returned by the agent. You must pass the same number of feedbacks at each step within an episode.

  • trainable (bool, optional) – Determines whether or not the agent’s policy should be updated as it interacts with its environment. The default state is True. To freeze the policy as is, set this flag to False.

  • episode_done (bool, optional) – Inidicates that an episode has reached completion. The default state is False.

Returns

Result from a reinforcement learning step.

Return type

qctrl.dynamic.types.reinforcement_learning_step.Result

See also

calculate_closed_loop_optimization_step()

Perform a single-step computation for closed-loop optimization.

calculate_optimization()

Performs gradient-based deterministic optimization of generic real-valued functions.

calculate_stochastic_optimization()

Performs gradient-based stochastic optimization of generic real-valued functions.

Examples

See the How to optimize controls starting from an incomplete system model user guide.