Value Function

Value Function

A value function predicts the expected return from a state when following a specific policy.

  • On-Policy Value Function $V^{\pi}(s)$: Expected return when starting in state $s$ and following policy $\pi$:

$$ V^{\pi}(s) = \underset{\tau \sim \pi}{\Epsilon}\left[R(\tau) \mid s_0 = s\right] $$

  • On-Policy Action-Value Function: Expected return when starting in state $s$, taking action $a$, then following policy $\pi$:

$$ Q^{\pi}(s,a) = \underset{\tau \sim \pi}{\Epsilon}\left[R(\tau) \mid s_0 = s, a_0 = a\right] $$

  • Optimal Value Function: Expected return when starting in state s and following the optimal policy

$$ V^* (s) = \max_{\pi}\underset{\tau \sim \pi}{\Epsilon}\left[R(\tau) \mid s_0 = s\right] $$

  • Optimal Action-Value Function: Expected return when starting in state s, taking action a, then following the optimal policy

$$ Q^*(s,a) = \max_{\pi}\underset{\tau \sim \pi}{\Epsilon}\left[R(\tau) \mid s_0 = s, a_0 = a\right] $$