calculate_reinforcement_learning_step(*, agent, environment_feedbacks, trainable=True, episode_done=False, **kwargs)¶
Perform a single-step of reinforcement learning.
Use this function to initialize a reinforcement learning agent, update it with observed states from the device or simulator of your choice, and generate further segments of an optimized control pulse. You can use this approach when your system is too complicated to model, when the computation of gradient is expensive or impossible, or if closed-loop optimization fails.
agent (qctrl.dynamic.types.reinforcement_learning_step.Agent) – The reinforcement learning agent. Option to initialize the agent or update the agent’s state.
environment_feedbacks (List[qctrl.dynamic.types.reinforcement_learning_step.EnvironmentFeedback]) – The batch of feedbacks (observation and reward pairs). Each will cause an action to be returned by the agent. You must pass the same number of feedbacks at each step within an episode.
trainable (bool, optional) – Determines whether or not the agent’s policy should be updated as it interacts with its environment. The default state is True. To freeze the policy as is, set this flag to False.
episode_done (bool, optional) – Inidicates that an episode has reached completion. The default state is False.
Result from a reinforcement learning step.
- Return type
Perform a single-step computation for closed-loop optimization.
Performs gradient-based deterministic optimization of generic real-valued functions.
Performs gradient-based stochastic optimization of generic real-valued functions.