Proximal Policy Optimization (PPO)
Proximal Policy Optimization (PPO)
- https://arxiv.org/abs/1707.06347v2
- https://paperswithcode.com/method/ppo
- https://docs.cleanrl.dev/rl-algorithms/ppo/
Proximal Policy Optimization, or PPO, is a policy gradient method for reinforcement learning. The motivation was to have an algorithm with the data efficiency and reliable performance of TRPO, while using only first-order optimization.