When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

2026-06-12 • Machine Learning

Machine Learning

AI summaryⓘ

The authors explore how to update specific facts in a language model without messing up other related knowledge. They introduce a method called method that uses two separate adapters: one that applies edits when needed, and another that keeps the original knowledge when edits shouldn't apply. Their system first decides whether to apply an edit based on the input prompt, which helps maintain accuracy across different test sets and models. The main improvement comes from clearly separating when to apply edits and when to preserve original knowledge.

knowledge editingmemory-assisted retrievalparameter-efficient adaptersrelevance routingLoRAlanguage modelsfact correctionedit memoryLlama-3Qwen3

Authors

Yining Huang

Abstract

Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central design question is not only how to write an edit, but also when to suppress it. We introduce \method{}, a route-specialized dual-adapter editor. A relevance router first decides whether a prompt should receive an edit memory. Routed prompts use an edit adapter trained to prefer the new object over the original object; unrouted non-direct prompts use a separate locality adapter trained to preserve or restore the original-object preference. We evaluate \method{} on three 1,000-case protocols, \cf{}, \zsre{}, and \mquake{}, under the same memory protocol and two 7B/8B base models. On Llama-3.1-8B-Instruct, \method{} obtains the best overall probability-preference accuracy on all three benchmarks: 0.8180 on \cf{}, 0.8946 on \zsre{}, and 0.9922 on \mquake{}. The same trend holds on Qwen3-8B. Router ablations show that the relevant memory boundary differs across datasets: a lexical neural router is safest on \cf{}, while BGE embedding routing is better on \zsre{} and \mquake{}. Component and module ablations show that the gain mainly comes from separating edit injection from off-route suppression rather than from simply increasing LoRA capacity.

View PDFOpen arXiv