Reward and Return
Reward and Return
The reward function $R$ depends on the current state of the world, the action just taken, and the next state of the world:
$$ r_t = R(s_t, a_t, s_{t+1}) $$
The return is the cumulative reward over a trajectory, $R(\tau)$.
- finite-horizon undiscounted return: the sum of rewards obtained in a fixed window of steps:
$$ R(\tau) = \sum_{t=0}^T r_t $$
- infinite-horizon discounted return: the sum of all rewards ever obtained by the agent, discounted by how far off in the future they’re obtained. For $\gamma \in (0,1)$:
$$ R(\tau) = \sum_{t=0}^{\infty} \gamma^t r_t $$