Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

2026-06-03Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors focus on making sure reinforcement learning (RL) agents act safely, even when the environment changes unpredictably. They use a method where they create safety boundaries called barrier-certificates based on sampled behavior, but these can be hard to get exactly right if the agent encounters unfamiliar situations. To improve this, the authors use a type of neural network called a variational autoencoder (VAE) to understand the agent's state space and better separate safe actions from risky ones. They then optimize two safety estimates—a conservative lower bound and a looser upper bound—and focus on states between these bounds to improve safety guarantees. Their experiments show these bounds can be tightened effectively, providing more reliable safety assurances.

reinforcement learningsafety guaranteesbarrier-certificatestransition perturbationsvariational autoencoderstate spaceprobabilistic boundspolicy verificationlatent spacedual optimization
Authors
Mohit Prashant, Arvind Easwaran
Abstract
Guaranteeing safety is critical to the deployment of reinforcement learning (RL) agents in the real-world, especially as policies learned using deep RL may demonstrate susceptibility to transition perturbations that result in unknown or unsafe behaviour. A method of policy verification is to construct probabilistic barrier-certificates by sampling policy trajectories with respect to safety constraints, thereby demarcating known safe behaviour from unknown behaviour. Obtaining tight upper and lower bounds on the probability of violation of these constraints may be difficult if the policy is susceptible to transition uncertainty or perturbation that places the agent in insufficiently explored states. To address this, we approximate the distribution of the encountered state-space using a variational autoencoder (VAE) and construct upper and lower-bound barrier-certificates using latent characteristics of states to optimize for regions of known, safe behaviour with high confidence. We frame this in our work as a dual optimization problem where the lower-bound barrier-certificate presents a more conservative estimate of the safe region than the upper-bound barrier-certificate. Sampling states that lie within the set difference of the two during training, i.e. the non-robust region, allows us to tighten the upper and lower bounds to provide sharper probabilistic guarantees on safety. Within our study, we describe the guarantees placed and demonstrate the tightness of our bounds experimentally.