Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications

Deep reinforcement learning has made significant progress in the last few years, with success stories in robotic control, game playing and science problems. While RL methods present a general paradigm where an agent learns from its own interaction with an environment, this requirement for “active” data collection is also a major hindrance in the application of RL methods to real-world problems, since active data collection is often expensive and potentially unsafe. An alternative “data-driven” paradigm of RL, referred to as offline RL (or batch RL) has recently regained popularity as a viable path towards effective real-world RL. As shown in the figure below, offline RL requires learning skills solely from previously collected datasets, without any active environment interaction. It provides a way to utilize previously collected datasets from a variety of sources, including human demonstrations, prior experiments, domain-specific solutions and even data from different but related problems, to build complex decision-making engines.

Several recent papers [1] [2] [3] [4] [5] [6], including our prior work [7] [8], have discussed that offline RL is a challenging problem — it requires handling distributional shifts, which in conjunction with function approximation and sampling error may make it impossible for standard RL methods [9] [10] to learn effectively from just a static dataset. However, over the past year, a number of methods have been proposed to tackle this problem, and substantial progress has been made in the area, both in development of new algorithms and applications to real-world problems. In this blog post, we will discuss two of our works that advance the frontiers of offline RL — conservative Q-learning (CQL), a simple and effective algorithm for offline RL and COG, a framework for robotic learning that leverages effective offline RL methods such as CQL, to allow agents to connect past data with recent experience, enabling a kind of “common sense” generalization when the robot is tasked with performing a task

This article is purposely trimmed, please visit the source to read the full article.

The post Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications appeared first on The Berkeley Artificial Intelligence Research Blog.

This post was originally published on this site