Sequence Modeling Solutions for Reinforcement Learning Problems
Long-horizon predictions of (top) the Trajectory Transformer compared to those of (bottom) a single-step dynamics model.
Modern machine learning success stories often have one thing in common: they use methods that scale gracefully with ever-increasing amounts of data. This is particularly clear from recent advances in sequence modeling, where simply increasing the size of a stable architecture and its training set leads to qualitatively different capabilities.
Meanwhile, the situation in reinforcement learning has proven more complicated. While it has been possible to apply reinforcement learning algorithms to large–scale problems, generally there has been much more friction in doing so. In this post, we explore whether we can alleviate these difficulties by tackling the reinforcement learning problem with the toolbox of sequence modeling. The end result is a generative model of trajectories that looks like a large language model and a planning algorithm that looks like beam search. Code for the approach can be found here.