Improving Parametric Knowledge Access in Reasoning Language Models
2026-02-25 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how language models remember facts stored inside them, like knowing Canberra is Australia's capital. They found that just telling a model to 'think step-by-step' helps it remember facts better but doesn't help with math problems. So, they trained the model with extra steps to reason about world knowledge by rewarding it for answering questions correctly. This training made the model better at recalling facts across several quiz-like tasks. Overall, the authors show that language models can improve how they think through and access their stored knowledge with some extra training.
language modelparametric knowledgereinforcement learningreasoning tracesworld knowledgeTriviaQANatural Questionsquestion answeringstep-by-step prompting
Authors
Melody Ma, John Hewitt
Abstract
We study reasoning for accessing world knowledge stored in a language model's parameters. For example, recalling that Canberra is Australia's capital may benefit from thinking through major cities and the concept of purpose-built capitals. While reasoning language models are trained via reinforcement learning to produce reasoning traces on tasks such as mathematics, they may not reason well for accessing their own world knowledge. We first find that models do not generate their best world knowledge reasoning by default: adding a simple "think step-by-step" cue demonstrates statistically significant improvement in knowledge recall but not math. Motivated by this, we propose training models to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward. After reinforcement learning on TriviaQA (+9.9%), performance also improves on Natural Questions, HotpotQA, SimpleQA, and StrategyQA by 4.2%, 2.1%, 0.6%, and 3.0%, respectively. Reasoning models are under-optimized for parametric knowledge access, but can be easily trained to reason better.