MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
2026-04-09 • Multiagent Systems
Multiagent Systems
AI summaryⓘ
The authors identify that large language models struggle with forgetting and making mistakes when reasoning over long and scattered information. To fix this, they propose MemCoT, a new method that helps models search for relevant information step-by-step instead of all at once. MemCoT uses special memory parts that help zoom in on important details and zoom out to understand the bigger context, as well as short-term memory to keep track of the search progress. Their tests show that MemCoT improves reasoning accuracy on challenging benchmarks for long-context understanding.
Large Language ModelsHallucinationCatastrophic ForgettingCausal ReasoningLong ContextMemory MechanismsIterative SearchShort-term MemorySemantic State MemoryEpisodic Memory
Authors
Haodong Lei, Junming Liu, Yirong Chen, Ding Wang, Hongsong Wang
Abstract
Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step passive matching process, leading to severe semantic dilution and contextual fragmentation. To overcome these fundamental bottlenecks, we propose MemCoT, a test-time memory scaling framework that redefines the reasoning process by transforming long-context reasoning into an iterative, stateful information search. MemCoT introduces a multi-view long-term memory perception module that enables Zoom-In evidence localization and Zoom-Out contextual expansion, allowing the model to first identify where relevant evidence resides and then reconstruct the surrounding causal structure necessary for reasoning. In addition, MemCoT employs a task-conditioned dual short-term memory system composed of semantic state memory and episodic trajectory memory. This short-term memory records historical search decisions and dynamically guides query decomposition and pruning across iterations. Empirical evaluations demonstrate that MemCoT establishes a state-of-the-art performance. Empowered by MemCoT, several open- and closed-source models achieve SOTA performance on the LoCoMo benchmark and LongMemEval-S benchmark.