Security Considerations for Artificial Intelligence Agents

2026-03-12 • Machine Learning

Machine LearningArtificial IntelligenceCryptography and Security

AI summaryⓘ

The authors describe security challenges in advanced AI systems that act like autonomous agents, explaining how these differ from traditional computer programs due to new ways they handle data and commands. Drawing from their experience with AI systems used by many people and businesses, they identify main weaknesses and possible attacks, such as tricking the AI indirectly or causing failures in long tasks. They review current security measures that try to block bad inputs, isolate actions, and strictly control sensitive operations. Finally, they highlight the need for better standards and research to keep these AI agents safe and trustworthy.

AI agentcode-data separationprompt injectionconfused-deputy problemsandboxingmulti-agent systemssecurity benchmarkspolicy enforcementNIST risk management

Authors

Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma

Abstract

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

View PDFOpen arXiv