Sequence Modeling Solutions for Reinforcement Learning Problems

Sequence Modeling Solutions for Reinforcement Learning Problems

Long-horizon predictions of (top) the Trajectory Transformer compared to those of (bottom) a single-step dynamics model.

Modern machine learning success stories often have one thing in common: they use methods that scale gracefully with ever-increasing amounts of data. This is particularly clear from recent advances in sequence modeling, where simply increasing the size of a stable architecture and its training set leads to qualitatively different capabilities.

Meanwhile, the situation in reinforcement learning has proven more complicated. While it has been possible to apply reinforcement learning algorithms to largescale problems, generally there has been much more friction in doing so. In this post, we explore whether we can alleviate these difficulties by tackling the reinforcement learning problem with the toolbox of sequence modeling. The end result is a generative model of trajectories that looks like a large language model and a planning algorithm that looks like beam search. Code for the approach can be found here.

The Trajectory Transformer

The standard framing of reinforcement learning focuses on decomposing a complicated long-horizon problem into smaller, more tractable subproblems, leading to dynamic programming methods like $Q$-learning and an emphasis on Markovian dynamics models. However, we can also view reinforcement learning as analogous to a sequence generation problem, with the goal being to produce a sequence of actions that, when enacted in an environment, will yield a sequence of high rewards.

Taking this view to its logical conclusion, we begin by modeling the trajectory data provided to reinforcement learning algorithms with a Transformer architecture, the current tool of choice for natural language modeling. We treat these trajectories as unstructured sequences of discretized states, actions, and rewards, and train the Transformer architecture using the standard cross-entropy loss. Modeling all trajectory data with a

This article is purposely trimmed, please visit the source to read the full article.

The post Sequence Modeling Solutions for Reinforcement Learning Problems appeared first on The Berkeley Artificial Intelligence Research Blog.

This post was originally published on this site