Proximal Policy Optimization (PPO)


Proximal Policy Optimization (PPO)

Proximal Policy Optimization, or PPO, is a policy gradient method for reinforcement learning. The motivation was to have an algorithm with the data efficiency and reliable performance of TRPO, while using only first-order optimization.