MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

2026-05-26Artificial Intelligence

Artificial IntelligenceComputation and LanguageMachine LearningMultiagent Systems
AI summary

The authors present MUSE-Autoskill Agent, a system that helps AI language models get better at solving tasks by creating, saving, organizing, testing, and improving their skills over time instead of treating each skill as a one-time use. Their framework lets the AI remember how well each skill worked in past tasks, so it can reuse and adapt skills more effectively. Tests show that managing skills this way helps improve how well the AI performs, how quickly it works, and how skills can be shared between different AI agents.

large language modelsskill creationskill reusememory mechanismskill evaluationcontinuous refinementagent frameworktask solvingSkillsBench
Authors
Huawei Lin, Peng Li, Jie Song, Fuxin Jiang, Tieying Zhang
Abstract
Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.