Control in (Generally) Regularised MDPs
This post explores the theory of regularised MDPs beyond entropic regularisation (which we covered in an older post). We will introduce convex regularisation of the classical Bellman operators and study the induced regularised policy iteration algorithms. On the way, we will tie some links with several popular algorithms. This post is mostly a good excuse to refresh some convex optimisation classics.