MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

2026-04-09Computation and Language

Computation and Language
AI summary

The authors found that language models have trouble keeping track of conversations spread over many turns because repeating the full chat history slows things down. They created MT-OSC, a method that smartly summarizes past chat content in the background to keep important details while reducing the amount of text the model has to consider. This approach cuts down the size of conversation history by up to 72% and helps models perform better or just as well on multi-turn tasks without extra delays or costs. MT-OSC works well with many different language models and types of conversations, making it a practical way to improve chat interactions.

Large Language ModelsMulti-turn DialogueContext WindowChat HistoryToken ReductionFew-shot InferencePerformance OptimizationLatencyOperational CostConversational AI
Authors
Jyotika Singh, Fang Tu, Miguel Ballesteros, Weiyi Sun, Sandip Ghoshal, Michelle Yuan, Yassine Benajiba, Sujith Ravi, Dan Roth
Abstract
Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap - yielding improved or preserved accuracy across datasets while remaining robust to distractors and irrelevant turns. Our results establish MT-OSC as a scalable solution for multi-turn chats, enabling richer context within constrained input spaces, reducing latency and operational cost, while balancing performance.