jaxplore • Proximal Policy Optimization (PPO)

Proximal Policy Optimization (PPO)

https://arxiv.org/abs/1707.06347v2
https://paperswithcode.com/method/ppo
https://docs.cleanrl.dev/rl-algorithms/ppo/

Proximal Policy Optimization, or PPO, is a policy gradient method for reinforcement learning. The motivation was to have an algorithm with the data efficiency and reliable performance of TRPO, while using only first-order optimization.