Distributional Dynamic Programming
Control in MDPs is often focused on the expected return criterion, which has the good taste to follow the Bellman dynamic programing equations we all love and cherish. In this post, we see how those equations generalise beyond the expected return to the full return distribution. We will cover the distributional Bellman equations for policy evaluation, and see them in action in practical distributional dynamic programming algorithm based on quantiles.