Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows
2026-06-12 • Artificial Intelligence
Artificial IntelligenceComputation and Language
AI summaryⓘ
The authors point out that large language models usually work by reading text in order, which doesn't fit well with modern agent workflows where many tasks happen in parallel. They introduce Parallel-Synthesis, a new method that allows combining information from multiple parallel processes directly, without turning everything back into text first. Their system uses a special mapper and a fine-tuned synthesizer to work with these parallel outputs efficiently. Tested on various tasks like math, science, and code, their approach performs as well or better than traditional methods while being much faster. This shows direct use of parallel data can improve how language models combine information.
large language modelsagent workflowsparallel processingKV cachesynthesizer adaptertext concatenationfine-tuningmulti-agent systemscode generationtime-to-first-token
Authors
Shikun Liu, Mufei Li, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li
Abstract
Large language models increasingly serve as execution engines for agentic systems, yet they still consume context through a sequential text interface. This creates a mismatch with modern structured agent workflows, in which independent branches explore subtasks, retrieve evidence, or generate candidate solutions before a final synthesis step. Existing systems typically merge these branches by concatenating their textual outputs, which discards the parallel structure and incurs redundant prefill computation. In this work, we introduce Parallel-Synthesis, a plug-and-play framework that enables a synthesizer to directly consume the KV caches produced by parallel worker agents. Parallel-Synthesis combines a cache mapper that calibrates independently generated branch caches with a fine-tuned synthesizer adapter that enables generation from this non-sequential cache interface. We train Parallel-Synthesis using data that exposes the synthesizer to parallel cache contexts, teaches aggregation across cached branches, and distills reasoning behavior from standard text-concatenation-based synthesis. Across nine downstream datasets spanning math, science QA, code generation, GAIA, and multi-agent database diagnosis, Parallel-Synthesis matches or outperforms text-based synthesis on seven datasets and remains close on the other two. It also reduces time-to-first-token by 2.5x-11x, suggesting that direct cache-based synthesis is a promising interface for more native and efficient synthesis over parallel agent branches.