Efficient and Sound Probabilistic Verification for AI Agents

2026-06-18 • Cryptography and Security

Cryptography and SecurityArtificial Intelligence

AI summaryⓘ

The authors focus on improving security for AI systems that work in complex digital settings by using methods to monitor their actions based on rules called policies. Unlike previous methods that only handle clear yes-or-no rules, the authors deal with cases where the rules have some uncertainty, like detectors that might fail sometimes. They create a new way to safely estimate how likely it is that the AI breaks these uncertain rules, even when different parts of the system might be related in complicated ways. Their tests show their method works better than older ones, giving a safer and more balanced performance.

AI agentsruntime monitoringDatalogprobabilistic policiessecurity policiesdistributionally robust optimizationpolicy violation probabilityprobabilistic inferencecorrelationsPII detection

Authors

Alaia Solko-Breslin, Pramod Kaushik Mudrakarta, Mihai Christodorescu, Somesh Jha, Krishnamurthy Dj Dvijotham

Abstract

Securing AI agents that operate in complex digital environments has become a critical need, and runtime monitoring approaches that formulate and enforce policies expressed in a formal language like Datalog offer a promising solution. However, existing approaches are restricted to deterministic policies. In many practical applications of AI agents, there is a need to enforce security policies in the face of ambiguity, leading to probabilistic predicates or state transitions (for example, a declassifier or Personally Identifiable Information (PII) detector that has some failure probability on each invocation). Furthermore, in many such applications, one cannot easily make the independence assumptions necessary to invoke prior work on probabilistic inference in Datalog. We address this by introducing a sound and efficient framework for such verification based on distributionally robust optimization, computing sound upper bounds on the probability of policy violation regardless of possible correlations between predicates. On standard benchmarks for terminal and tool calling agents, we demonstrate that our approach outperforms prior art and improves the security-utility trade-off while ensuring rigorous bounds on the probability of policy violation.

View PDFOpen arXiv