Deep reinforcement learning has made significant progress in the last few years, with success stories in robotic control, game playing and science problems. While RL methods present a general paradigm where an agent learns from its own interaction with an environment, this requirement for “active” data collection is also a major hindrance in the application of RL methods to real-world problems, since active data collection is often expensive and potentially unsafe. An alternative “data-driven” paradigm of RL, referred to as offline RL (or batch RL) has recently regained popularity as a viable path towards effective real-world RL. As shown in the figure below, offline RL requires learning skills solely from previously collected datasets, without any active environment interaction. It provides a way to utilize previously collected datasets from a variety of sources, including human demonstrations, prior experiments, domain-specific solutions and even data from different but related problems, to build complex decision-making engines.
Several recent papers [1] [2] [3] [4] [5] [6], including our prior work [7] [8], have discussed that offline RL is a challenging problem — it requires handling distributional shifts, which in conjunction with function approximation and sampling error may make it impossible for standard RL methods [9] [10] to learn effectively from just a static dataset. However, over the past year, a number of methods have been proposed to tackle this problem, and substantial progress has been made in the area, both in development of new algorithms and applications to real-world problems. In this blog post, we will discuss two of our works that advance the frontiers of offline RL — conservative Q-learning (CQL), a simple and effective algorithm for offline RL and COG, a framework for robotic learning that leverages effective offline RL methods such as CQL, to allow agents to connect past data with recent experience, enabling a kind of “common sense” generalization when the robot is tasked with performing a task
This article is purposely trimmed, please visit the source to read the full article.
The post Offline Reinforcement Learning: How Conservative Algorithms Can Enable New Applications appeared first on The Berkeley Artificial Intelligence Research Blog.