static FunctionNamespace.calculate_reinforcement_learning_step(*, agent, environment_feedbacks, trainable=True, episode_done=False, **kwargs)

Perform a single step of reinforcement learning.

Use this function to initialize a reinforcement learning agent, update it with observed states from the device or simulator of your choice, and generate further segments of an optimized control pulse. You can use this approach when your system is too complicated to model, when the computation of gradient is expensive or impossible, or if closed-loop optimization fails.

  • agent (qctrl.dynamic.types.reinforcement_learning_step.Agent) – The reinforcement learning agent. Option to initialize the agent or update the agent’s state.

  • environment_feedbacks (List[qctrl.dynamic.types.reinforcement_learning_step.EnvironmentFeedback]) – The batch of feedbacks (observation and reward pairs). Each will cause an action to be returned by the agent. You must pass the same number of feedbacks at each step within an episode.

  • trainable (bool, optional) – Determines whether or not the agent’s policy should be updated as it interacts with its environment. The default state is True. To freeze the policy as is, set this flag to False.

  • episode_done (bool, optional) – Inidicates that an episode has reached completion. The default state is False.


Result from a reinforcement learning step.

Return type:


See also


Perform a single step computation for closed-loop optimization.


Perform gradient-based deterministic optimization of generic real-valued functions.


Perform gradient-based stochastic optimization of generic real-valued functions.


See the How to optimize controls starting from an incomplete system model user guide.