Asynchronous Advantage Actor-Critic (A3C)
Asynchronous Advantage Actor-Critic (A3C)
A3C, Asynchronous Advantage Actor Critic, is a policy gradient algorithm in reinforcement learning that maintains a policy and an estimate of the value function . It operates in the forward view and uses a mix of $n$-step returns to update both the policy and the value-function.