SAGE: A Quantitative Evaluation of Socialized Evolution in Agent Ecosystems
2026-06-02 • Artificial Intelligence
Artificial IntelligenceComputation and Language
AI summaryⓘ
The authors study how language agents improve when they can learn from their own experience alone versus when they can also see what other agents have done. They create a test called SAGE to compare agents working alone to those sharing their histories in groups, across different tasks like research, planning, and games. They find that sharing experiences doesn’t always help the best agents do better, but it can help agents who get stuck by themselves to improve more. Also, it’s not just about having more information but about summarizing and abstracting useful lessons from others. Overall, learning from peers depends on the agent type, the task, and how well they can use shared knowledge.
self-improving agentssocial learningSAGE frameworkagent co-evolutiontask feedbackpeer historyabstractionlong-horizon planningstrategic multiplayer playcounterfactual controls
Authors
Linyue Pan, Yaoming Zhu, Lin Qiu, Xuezhi Cao, Xunliang Cai
Abstract
Self-improving language agents are typically evaluated in isolation: an agent attempts a task, receives feedback, and iteratively refines its own behavior. Yet agents increasingly operate alongside peers whose strategies and outcomes are publicly visible. This raises an under-studied question: when does shared experience produce improvements that self-improvement alone cannot achieve? We introduce SAGE (Social Agent Group Evolution),an evaluation framework that compares two compute-matched conditions: SocialEvo, where agents from five distinct model families co-evolve with access to all peers' histories; and SelfEvo, where each agent receives the same number of task attempts but sees only its own past, which is conventional in self-improving agent studies. We instantiate SAGE in three arenas: open-ended ML research, long-horizon economic planning, and strategic multiplayer play, evaluated across multiple evolutionary rounds. We find that group history is not a universal amplifier: the strongest agent does not exceed its self-evolution ceiling. However, agents that plateau under self-improvement can achieve significant breakthroughs when peer experience is available. In competitive settings, counterfactual controls reveal that agents improve generally rather than developing opponent-specific strategies. Across different forms of shared history, filtered peer traces and reflective summaries often outperform raw logs, indicating that social gains depend on abstraction rather than exposure volume. These findings reveal that peer-history gains are agent-specific, arena-dependent, and contingent on the capacity to abstract transferable knowledge from public traces.