Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

2026-06-04 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors studied how large language model (LLM) agents remember things during complex tasks that take a long time. They created a system to classify and measure how different memory methods work, looking at how agents store and use memories. By testing ten memory systems, the authors found how design choices affect the cost of writing and reading memories. They also gave practical advice for improving memory management in these agents.

LLM agentsmemory systemsretrievalknowledge consolidationcost profilingsystem taxonomylong-horizon tasksmemory managementgenerationagent control flows

Authors

Yasmine Omri, Ziyu Gan, Zachary Broveak, Robin Geens, Zexue He, Alex Pentland, Marian Verhelst, Tsachy Weissman, Thierry Tambe

Abstract

LLM agents are increasingly deployed on long-horizon tasks requiring sustained reasoning over extended interaction histories. Realizing this at scale requires agents to persistently store, retrieve, and update their own memory across sessions. A rich ecosystem of agent memory systems has emerged spanning flat retrieval, LLM-mediated extraction, consolidating fact stores, and agentic control flows. Yet, their system-level behavior remains uncharacterized. We present the first systems characterization of agent memory. First, we introduce a system-oriented taxonomy classifying agent memory systems along four axes. Second, we build a phase-aware profiling harness attributing cost to construction, retrieval, and generation. Third, we characterize ten representative systems across two benchmark suites, uncovering how design choices shift cost across the write and read paths. Finally, we derive 10 system recommendations covering construction scheduling, capability floors, amortization via query volume, freshness-latency tradeoffs, and fleet-scale management.

View PDFOpen arXiv