When to Trust the Cheap Check: Weak and Strong Verification for Reasoning
2026-02-19 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors study how large language models (LLMs) are checked for correctness using two kinds of verification: weak verification, which is fast but not very accurate, and strong verification, which is very reliable but takes more effort. They create a system that decides when to trust the quick checks and when to ask for the more thorough verification. Their approach uses two thresholds to balance mistakes and verification costs, and they develop an online method that manages errors without needing specific assumptions about the model or the questions. This helps make verifying LLM outputs more efficient and trustworthy.
Large Language ModelsWeak VerificationStrong VerificationSelf-consistencyProxy RewardsVerification PoliciesCalibrationSharpnessOnline AlgorithmsError Control
Authors
Shayan Kiyani, Sima Noorani, George Pappas, Hamed Hassani
Abstract
Reasoning with LLMs increasingly unfolds inside a broader verification loop. Internally, systems use cheap checks, such as self-consistency or proxy rewards, which we call weak verification. Externally, users inspect outputs and steer the model through feedback until results are trustworthy, which we call strong verification. These signals differ sharply in cost and reliability: strong verification can establish trust but is resource-intensive, while weak verification is fast and scalable but noisy and imperfect. We formalize this tension through weak--strong verification policies, which decide when to accept or reject based on weak verification and when to defer to strong verification. We introduce metrics capturing incorrect acceptance, incorrect rejection, and strong-verification frequency. Over population, we show that optimal policies admit a two-threshold structure and that calibration and sharpness govern the value of weak verifiers. Building on this, we develop an online algorithm that provably controls acceptance and rejection errors without assumptions on the query stream, the language model, or the weak verifier.