Control in Entropy Regularized MDPs
This post introduces the Maximum Entropy (MaxEnt) framework for Reinforcement Learning. The ambition is to see how “standard” control algorithms like value or policy iteration translate to the MaxEnt setting, and describe their convergence properties. We will also discuss how said control approach influenced the design of some popular modern algorithms such as the Soft Actor Critic (SAC).