discrete_action_space_size (int) – The number of possible actions the agent can choose from. No further
information about the action space is needed for this agent. Must be
positive. See Also: Action.discreteIndex
reward_discount_factor (float, optional) – Indicates the level of discounting that far-off rewards receive. 0 <=
rewardDiscountFactor <= 1. A rewardDiscountFactor of zero means the
agent will learn to be maximally near-sighted and learn to maximize
one-step returns. A rewardDiscountFactor of one means the agent will
treat all rewards equally and learn to maximize the average return over
an entire trajectory.
learning_rate (float, optional) – Indicates the learning rate used to optimize the agent’s policy neural
network. 0 < learningRate, typically <= 1.
learning_rate_decay_factor (float, optional) – Indicates the geometric decay rate for the policy network’s gradient
update step. 0 < learningRateDecayFactor <= 1. This decay factor is
multiplicatively applied to the current learning rate after each
episode. Learning rate decay occurs until the minLearningRate has
been hit, where it plateaus.
min_learning_rate (float, optional) – Indicates the minimum policy neural network learning rate value after
which the decaying stops. 0 < minLearningRate <= learningRate. Defaults
to 75% of the learning rate.
rng_seed (int, optional) – Optional. Seed for the random number generator. Use this option to
generate deterministic results from the agent.