class PolicyGradientInitializer(*, discrete_action_space_size, reward_discount_factor=0.95, learning_rate=0.01, learning_rate_decay_factor=0.999, min_learning_rate=None, seed=None, rng_seed=None)

Configuration for the policy gradients based agent. The agent trains a policy that is represented by a neural network to maximize the long term discounted rewards it receives when interacting with the user-specified environment. The policy is updated at the conclusion of a complete episode.

  • discrete_action_space_size (int) – The number of possible actions the agent can choose from. No further information about the action space is needed for this agent. Must be positive. See Also: Action.discreteIndex

  • reward_discount_factor (float, optional) – Indicates the level of discounting that far-off rewards receive. Must be between zero and one. A reward_discount_factor of zero means the agent will learn to be maximally near-sighted and learn to maximize one-step returns. A reward_discount_factor of one means the agent will treat all rewards equally and learn to maximize the average return over an entire trajectory. Defaults to 0.95.

  • learning_rate (float, optional) – Indicates the learning rate used to optimize the agent’s policy neural network. Must be positive, and is typically smaller than one. Defaults to 0.01.

  • learning_rate_decay_factor (float, optional) – Indicates the geometric decay rate for the policy network’s gradient update step. Must be between zero and one. This decay factor is multiplicatively applied to the current learning rate after each episode. Learning rate decay occurs until the min_learning_rate has been hit, where it plateaus. Defaults to 0.999.

  • min_learning_rate (float, optional) – Indicates the minimum policy neural network learning rate value after which the decaying stops. Must be positive and smaller or equal to the learning rate. Defaults to 75% of the learning rate.

  • seed (int, optional) – Seed for the random number generator. If set, must be non-negative. Use this option to generate deterministic results from the agent.

  • rng_seed (int, optional) – This parameter will be removed, please use seed instead.