Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

2026-05-05Computation and Language

Computation and Language
AI summary

The authors studied how large language models sometimes make up false facts, called hallucinations, which can be a problem. They observed that past methods either looked at small internal patterns or asked the model to judge its own answers, but not both together. To fix this, they created LaaB, a new approach that connects these two views by turning the model's self-judgments back into features it can learn from, improving detection of false statements. They tested LaaB on several datasets and models, showing it works better than other methods.

Large Language ModelsHallucination DetectionNeural FeaturesSymbolic ReasoningMeta-JudgmentSelf-JudgmentMutual LearningLogical ConsistencyUncertainty Quantification
Authors
Hao Mi, Qiang Sheng, Shaofei Wang, Beizhe Hu, Yifan Sun, Zhengjia Wang, Hengqi Zeng, Yang Li, Danding Wang, Juan Cao
Abstract
Large Language Models (LLMs) are prone to factual hallucinations, risking their reliability in real-world applications. Existing hallucination detectors mainly extract micro-level intrinsic patterns for uncertainty quantification or elicit macro-level self-judgments through verbalized prompts. However, these methods address only a single facet of the hallucination, focusing either on implicit neural uncertainty or explicit symbolic reasoning, thereby treating these inherently coupled behaviors in isolation and failing to exploit their interdependence for a holistic view. In this paper, we propose LaaB (Logical Consistency-as-a-Bridge), a framework that bridges neural features and symbolic judgments for hallucination detection. LaaB introduces a "meta-judgment" process to map symbolic labels back into the feature space. By leveraging the inherent logical bridge where response and meta-judgment labels are either the same or opposite based on the self-judgment's semantics, LaaB aligns and integrates dual-view signals via mutual learning and enhances the hallucination detection. Extensive experiments on 4 public datasets, across 4 LLMs, against 8 baselines demonstrate the superiority of LaaB.