Advantage Function

Advantage Function

The advantage function $A^{\pi}(s,a)$ corresponding to a policy $\pi$ describes how much better it is to take a specific action $a$ in state $s$, over randomly selecting an action according to $\pi(\cdot|s)$, assuming you act according to $\pi$ forever after.

$$ A^{\pi}(s,a) = Q^{\pi}(s,a) - V^{\pi}(s). $$