TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation
2026-04-09 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors focus on improving personalized large language models (PLLMs), which are designed to better match individual user needs over time. They point out that current methods struggle with remembering and using a user’s long history effectively. To fix this, the authors introduce TSUBASA, a method that improves how the model updates its memory and how it learns from past interactions by itself. Tests show that TSUBASA performs better than other memory-based systems, managing a good balance between quality and efficiency while providing more accurate personalization with less computational effort.
Personalized large language modelsLong-horizon tasksMemory mechanismsRetrieval-Augmented Generation (RAG)Parametric adaptationDynamic memory evolutionContext distillationQwen modelMemory-augmented systemsPareto improvement
Authors
Xinliang Frederick Zhang, Lu Wang
Abstract
Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.