The Successor Representation, Gamma-Models, and Infinite-Horizon Prediction
Standard single-step models have a horizon of one. This post describes a method for training predictive dynamics models in continuous state spaces with an infinite, probabilistic horizon.
Reinforcement learning algorithms are frequently categorized by whether they predict future states at any point in their decision-making process. Those that do are called model-based, and those that do not are dubbed model-free. This classification is so common that we mostly take it for granted these days; I am guilty of using it myself. However, this distinction is not as clear-cut as it may initially seem.
In this post, I will talk about an alternative view that emphases the mechanism of prediction instead of the content of prediction. This shift in focus brings into relief a space between model-based and model-free methods that contains exciting directions for reinforcement learning. The first half of this post describes some of the classic tools in this space, including generalized value functions and the successor representation. The latter half is based on our recent paper about infinite-horizon predictive models, for which code is available here.
The what versus how of prediction
The dichotomy between model-based and model-free algorithms focuses on what is predicted directly: states or values. Instead, I want to focus on how these predictions are made, and specifically how these approaches deal with the complexities arising from long horizons.
Dynamics models, for instance, approximate a single-step transition distribution, meaning that they are trained on a prediction problem with a horizon of one. In order to make a short-horizon model useful for long-horizon queries, its single-step predictions are composed in the form of sequential model-based rollouts. We could say that the “testing” horizon of a model-based method is that of the rollout.
This article is purposely trimmed, please visit the source to read the full article.