jaxplore • Tranformer-XL (PPO-TrXL)

Tranformer-XL (PPO-TrXL)

https://arxiv.org/abs/2309.17207
https://docs.cleanrl.dev/rl-algorithms/ppo-trxl/

Real-world tasks may expose imperfect information (e.g. partial observability). Such tasks require an agent to leverage memory capabilities. One way to do this is to use recurrent neural networks (e.g. LSTM) as seen in ppo_atari_lstm.py. Here, Transformer-XL is used as episodic memory in Proximal Policy Optimization (PPO).