Successor States and Representations (1/2)
The main promise of unsupervised RL is test-time adaptation to newly specified reward functions. This requires a systemic untangling of reward and dynamics in traditional RL tools. In this first post of a short series, we see how this can be done via the concept of successor states and successor representations. We will focus on policy evaluation, leaving control for a follow-up.