A PPO Saga
For better or for worse, proximal policy optimisation (PPO) algorithms and its variants are dominating the RL landscape these days. This post aims at retracing their journey, from foundational concepts to LLM-savy innovations. We will start this saga on the theoretical trail, which we will progressively abandon to pay closer attention to algorithmic aspects.