Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

2026-03-31Cryptography and Security

Cryptography and SecurityArtificial Intelligence
AI summary

The authors explain that AI agents using large language models can be tricked by hidden harmful instructions in the data they process, which can cause them to act in risky ways. They suggest three main ideas to defend against this: changing plans and rules as situations change, limiting what the AI model can see and decide on to keep control, and involving humans especially when things are unclear. They also point out that current tests for these problems might not be enough, and emphasize that building defenses into the overall system is important for safer AI. This system-level approach mixes strict rules and AI checks to better handle security and improve research.

Large Language ModelsPrompt InjectionAI SecurityDynamic ReplanningSystem-level DefenseHuman-in-the-loopModel RobustnessContext-dependent DecisionsBenchmark LimitationsAgentic Systems
Authors
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh
Abstract
AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.