Average Reward Control (1/2)
Thanks to its relative simplicity and conciseness, the discounted approach to control in MDPs has come to largely prevail in the RL theory and practice landscape. Departing from the myopic nature of discounted control, we study here the average-reward objective which focuses on long-term, steady state rewards. To start gently, we will limit ourselves to establishing Bellman equations for policy evaluation.