Doc-to-Atom: Learning to Compile and Compose Memory Atoms
2026-06-10 • Computation and Language
Computation and LanguageInformation Retrieval
AI summaryⓘ
The authors address the problem of slow and memory-heavy processing of long documents by large language models. They improve on a previous method called Doc-to-LoRA, which compresses document knowledge into one adapter but struggles with irrelevant information and scaling. Their new system, Doc-to-Atom, breaks documents into smaller, meaningful pieces called atoms, each with its own mini-adapter. At inference, only the relevant atoms are combined for a specific query, making the process more efficient and accurate. Tests on question-answering tasks show Doc-to-Atom is better and uses less memory than before.
Long input sequencesLarge Language ModelsAttention mechanismContext distillationLoRA adaptersDoc-to-LoRAParameter-efficient tuningCompositional memoryQuery routingMulti-objective distillation
Authors
Xingjian Diao, Wenbo Li, Yashas Malur Saidutta, Avinash Amballa, Lazar Valkov, Srinivas Chappidi
Abstract
Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the quadratic cost of attention makes inference both memory-intensive and slow. Context distillation mitigates this by compressing contextual information into model parameters, and recent work such as Doc-to-LoRA amortizes context distillation into a single forward pass that generates one LoRA adapter per document. However, producing a single monolithic adapter for all queries leads to irrelevant-query interference, limited compositional recall, and poor scalability to long-document reasoning. To address these challenges, we propose Doc-to-Atom (Doc2Atom), a compositional parametric memory framework that decomposes each document into semantically typed knowledge atoms. Each atom is compiled into an independent micro-LoRA adapter and a provenance retrieval key. At inference time, a lightweight query router selects and assembles only the relevant atoms into a query-specific adapter, which is then injected into a frozen base model. The entire system is trained end-to-end through a multi-objective distillation framework. Experiments on six diverse QA benchmarks demonstrate that Doc2Atom outperforms Doc-to-LoRA baselines while reducing the memory cost of document internalization.