Improving Generalization in Reinforcement Learning using Policy Similarity Embeddings

Posted by Rishabh Agarwal, Research Associate, Google Research, Brain Team

Reinforcement learning (RL) is a sequential decision-making paradigm for training intelligent agents to tackle complex tasks such as robotic locomotion, playing video games, flying stratospheric balloons and designing hardware chips. While RL agents have shown promising results in a variety of activities, it is difficult to transfer the capabilities of these agents to new tasks, even when these tasks are semantically equivalent. For example, consider a jumping task, where an agent, learning from image observations, needs to jump over an obstacle. Deep RL agents trained on a few of these tasks with varying obstacle positions struggle to successfully jump with obstacles at previously unseen locations.

Jumping task: The agent (white block), learning from pixels, needs to jump over an obstacle (gray square). The challenge is to generalize to unseen obstacle positions and floor heights in test tasks using a small number of training tasks. In a given task, the agent needs to time the jump precisely, at a specific distance from the obstacle, otherwise it will eventually hit the obstacle.

In “Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning”, presented as a spotlight at ICLR 2021, we incorporate the inherent sequential structure in RL into the representation learning process to enhance generalization in unseen tasks. This is orthogonal to the predominant approaches before this work, which were typically adapted from supervised learning, and, as such, largely ignore this sequential aspect. Our approach exploits the fact that an agent, when operating in tasks with similar underlying mechanics, exhibits at least short sequences of behaviors that are similar across these tasks.

Prior work on generalization was typically adapted from supervised learning and revolved around enhancing the learning process. These approaches rarely exploit properties of the sequential aspect such as similarity in actions across temporal observations.

Our approach trains the agent to learn a representation

This article is purposely trimmed, please visit the source to read the full article.

The post Improving Generalization in Reinforcement Learning using Policy Similarity Embeddings appeared first on Google AI Blog.

This post was originally published on this site