Papers

A curated collection of papers focusing on Reinforcement Learning.

  • Proximal Policy Optimization Algorithms

    Author: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
    Published: July 2017

    We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a 'surrogate' objective function using stochastic gradient ascent.

  • Trust Region Policy Optimization

    Author: John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
    Published: April 2017

    We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO).

  • Deep Reinforcement Learning with Double Q-Learning

    Authors: Hasselt, Hado van; Guez, Arthur; Silver, David
    Published: April 2015

    Introduces the Double Q-Learning algorithm to address the overestimation bias in traditional Q-learning methods.

  • Policy Gradient Methods for RL

    Author: Richard S. Sutton
    Published: June 2000

    Seminal work on policy gradient methods for reinforcement learning.