Language Models Need Sleep

2026-05-25Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors propose a method for large language models to better handle long pieces of text by mimicking sleep. Their model 'sleeps' by reviewing and consolidating information offline, updating its memory through a special learned process, then clearing its working memory to stay fast during actual use. They tested this on synthetic reasoning tasks and math problems where normal models struggle, finding that longer 'sleep' times led to better results, especially for complex reasoning. This approach shifts some work to offline time to keep real-time predictions quick.

transformerlarge language modelattention mechanismfast weightsstate-space modelsleep consolidationrecurrent passesmulti-hop reasoningcellular automatamath reasoning
Authors
Sangyun Lee, Sean McLeish, Tom Goldstein, Giulia Fanti
Abstract
Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.