Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. As a preview of the paper, in this post we consider a subset of the paper’s directions, namely withstanding hazards (“Robustness”), identifying hazards (“Monitoring”), and steering ML systems (“Alignment”).
Robustness research aims to build systems that are less vulnerable to extreme hazards and to adversarial threats. Two problems in robustness are robustness to long tails and robustness to adversarial examples.
Examples of long tail events. First row, left: an ambulance in front of a green light. First row, middle: birds on the road. First row, right: a reflection of a pedestrian. Bottom row, left: a group of people cosplaying. Bottom row, middle: a foggy road. Bottom row, right: a person partly occluded by a board on their back. (Source)
To operate in open-world high-stakes environments, ML systems will need to endure unusual events and tail risks. However, current ML systems are brittle in the face of real-world complexity, and they become unreliable when faced with novel situations like those above. To reduce the chance that ML systems choose a wrong course of action in environments dominated by rare events, models need to be unusually robust.
Adversarial perturbations. An example of an input image altered by an adversarial perturbation. After the adversarial perturbation, the neural network makes a high confidence mistake. (Source)
Adversaries can easily manipulate vulnerabilities in ML systems and cause them to make mistakes. As shown above, carefully crafted small perturbations are enough to break ML systems. In the paper, we focus on this problem but also suggest that researchers consider
This article is purposely trimmed, please visit the source to read the full article.
The post Unsolved ML Safety Problems appeared first on The Berkeley Artificial Intelligence Research Blog.