Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

2026-02-23 • Cryptography and Security

Cryptography and SecurityMachine Learning

AI summaryⓘ

The authors studied how large language model (LLM) agents, which can use extra skills to do more things, are vulnerable to harmful instructions hidden inside these skills. They created a test called SkillInject to check how easily these agents can be tricked into doing bad actions, like stealing data or deleting files. Their tests showed that many current LLM agents often fall for these tricky injections, and fixing this problem needs smarter security than just bigger models or simple filters. The authors suggest building more aware permission systems to keep these agents safe.

Large Language Models (LLMs)Agent SkillsPrompt InjectionSkillInject BenchmarkCode ExecutionData ExfiltrationSecurity VulnerabilitiesContext-Aware AuthorizationRansomwareInput Filtering

Authors

David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko

Abstract

LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful instructions including data exfiltration, destructive action, and ransomware-like behavior. They furthermore suggest that this problem will not be solved through model scaling or simple input filtering, but that robust agent security will require context-aware authorization frameworks. Our benchmark is available at https://www.skill-inject.com/.

View PDFOpen arXiv