Deep reinforcement learning has made significant progress in the last few years, with success stories in robotic control, game playing and science problems. While RL methods present a general paradigm where an agent learns from its own interaction with an environment, this requirement for “active” data collection is also a major hindrance in the application of RL methods to real-world problems, since active data collection is often expensive and potentially unsafe. An alternative “data-driven” paradigm of RL, referred to as offline RL (or batch RL) has recently regained popularity as a viable path towards effective real-world RL. As shown in the figure below, offline RL requires learning skills solely from previously collected datasets, without any active environment interaction. It provides a way to utilize previously collected datasets from a variety of sources, including human demonstrations, prior experiments, domain-specific solutions and even data from different but related problems, to build complex decision-making engines.
Several recent papers      , including our prior work  , have discussed that offline RL is a challenging problem — it requires handling distributional shifts, which in conjunction with function approximation and sampling error may make it impossible for standard RL methods   to learn effectively from just a static dataset. However, over the past year, a number of methods have been proposed to tackle this problem, and substantial progress has been made in the area, both in development of new algorithms and applications to real-world problems. In this blog post, we will discuss two of our works that advance the frontiers of offline RL — conservative Q-learning (CQL), a simple and effective algorithm for offline RL and COG, a framework for robotic learning that leverages effective offline RL methods such as CQL, to allow agents to connect past data with recent experience, enabling a kind of “common sense” generalization when the robot is tasked with performing a task
This article is purposely trimmed, please visit the source to read the full article.