jaxplore • Twin Delayed Deep Deterministic (TD3)

Twin Delayed Deep Deterministic (TD3)

https://arxiv.org/abs/1802.09477v3
https://paperswithcode.com/method/td3
https://docs.cleanrl.dev/rl-algorithms/td3/

TD3 builds on the DDPG algorithm for reinforcement learning, with a couple of modifications aimed at tackling overestimation bias with the value function. In particular, it utilises clipped double Q-learning, delayed update of target and policy networks, and target policy smoothing.