AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

2026-06-12 • Computation and Language

Computation and Language

AI summaryⓘ

The authors created AgentSpec, a system that breaks down complex AI agents into smaller, reusable parts with clear roles, making it easier to swap and test different components. They used this system to study how different parts like memory and reasoning work together across various tasks, finding that overall performance depends more on how these parts fit and interact than on each part alone. Their work helps researchers better understand and build flexible AI agents by providing a common framework to compare designs. They also made their code and tools available for others to use.

LLM agentsmodular designpolicy componentsmemory systemsreasoningreflectionreinforcement learningscaffold compatibilitycomposabilityembodied AI

Authors

Jixuan Chen, Jianzhi Shen, Haoqiang Kang, Zhi Hong, Qingyi Jiang, Soham Bose, Yiming Zhang, Leon Leng, Amit Vyas, Lingjun Mao, Siru Ouyang, Kun Zhou, Lianhui Qin

Abstract

LLM agents are increasingly built not as single model calls, but as scaffolded systems that combine reasoning, memory, reflection, action execution, and learning. While such scaffolds often improve performance, they are often embedded in tightly coupled pipelines, making it difficult to isolate component contributions, compare alternative designs, or understand how module interactions shape agent behavior. We introduce AgentSpec, a modular specification framework that represents embodied agents as typed compositions of reusable policy components with standardized interfaces. AgentSpec standardizes the interfaces among perception, memory, reasoning, reflection, action, and optional learning, enabling components to be swapped and recombined under controlled conditions. We instantiate this framework across DeliveryBench, ALFRED, MiniGrid, and RoboTHOR, and analyze reasoning, memory, reflection, and reinforcement-learning modules across model backbones. Our results show that agent performance is governed by scaffold compatibility and interaction effects rather than isolated module strength. In particular, structured multi-granularity memory improves long-horizon state tracking, reasoning and memory interact non-uniformly across environments, reflection trades off correction and cost, and RL-trained policies compose best when optimized with deployment-time scaffold structure. AgentSpec provides a controlled foundation for studying, comparing, and designing composable LLM agents. Our code, baselines and interactive playground are publicly available at https://agentspec-embodied.github.io.

View PDFOpen arXiv