Reasoning Shift: How Context Silently Shortens LLM Reasoning
2026-04-01 • Machine Learning
Machine Learning
AI summaryⓘ
The authors studied how large language models (LLMs) handle reasoning when given extra or complex context. They found that when problems are presented with extra unrelated information or inside bigger tasks, the models produce shorter reasoning steps and check their answers less often than when solving problems alone. This shorter reasoning doesn’t hurt simple problem performance but could cause issues with harder problems. The authors suggest that better understanding and managing context is important for improving these models’ reasoning reliability.
large language modelsreasoning tracesself-verificationcontext managementuncertainty managementmulti-turn conversationtask decompositionrobustnessscaling behavior
Authors
Gleb Rodionov
Abstract
Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this, we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task. We observe an interesting phenomenon: reasoning models tend to produce much shorter reasoning traces (up to 50%) for the same problem under different context conditions compared to the traces produced when the problem is presented in isolation. A finer-grained analysis reveals that this compression is associated with a decrease in self-verification and uncertainty management behaviors, such as double-checking. While this behavioral shift does not compromise performance on straightforward problems, it might affect performance on more challenging tasks. We hope our findings draw additional attention to both the robustness of reasoning models and the problem of context management for LLMs and LLM-based agents.