Online Experiential Learning for Language Models
2026-03-17 • Computation and Language
Computation and Language
AI summaryⓘ
The authors propose a method called Online Experiential Learning (OEL) that lets language models improve continuously by learning from their own experiences during real use, instead of only learning from pre-collected data or simulations. Their approach first collects useful knowledge from user interactions, then updates the model using that knowledge without needing direct access to the user environment. By repeating these steps, the model gradually gets better at its tasks and becomes more efficient. They tested OEL on text-based games and found it consistently improves accuracy and efficiency without losing the ability to handle new situations.
Large Language ModelsOnline LearningExperiential LearningOn-policy LearningContext DistillationText-based GamesTransfer LearningModel EfficiencyOut-of-distribution PerformanceInteraction Trajectories
Authors
Tianzhu Ye, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei
Abstract
The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based game environments across multiple model scales and both thinking and non-thinking variants. OEL achieves consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. Our analysis further shows that extracted experiential knowledge is significantly more effective than raw trajectories, and that on-policy consistency between the knowledge source and the policy model is critical for effective learning.