Uncertainty-Aware Transformers: Conformal Prediction for Language Models
2026-04-10 • Machine Learning
Machine Learning
AI summaryⓘ
The authors introduce CONFIDE, a method that helps transformer-based language models like BERT show how confident they are in their predictions. Instead of just giving an answer, CONFIDE uses a mathematical technique called conformal prediction on the internal parts of the model to produce sets of possible answers with explanations. This makes the model's decisions more trustworthy, especially in tricky or important situations where mistakes are costly. Their tests show CONFIDE can improve accuracy and provide better uncertainty estimates than some earlier methods. Overall, the authors offer a new way to make language models more transparent and reliable.
TransformerBERTRoBERTaConformal PredictionUncertainty QuantificationEncoder Embeddings[CLS] TokenNonconformity ScoresModel CalibrationPrediction Sets
Authors
Abhiram Vellore, Niraj K. Jha
Abstract
Transformers have had a profound impact on the field of artificial intelligence, especially on large language models and their variants. However, as was the case with neural networks, their black-box nature limits trust and deployment in high-stakes settings. For models to be genuinely useful and trustworthy in critical applications, they must provide more than just predictions: they must supply users with a clear understanding of the reasoning that underpins their decisions. This article presents an uncertainty quantification framework for transformer-based language models. This framework, called CONFIDE (CONformal prediction for FIne-tuned DEep language models), applies conformal prediction to the internal embeddings of encoder-only architectures, like BERT and RoBERTa, while enabling hyperparameter tuning. CONFIDE uses either [CLS] token embeddings or flattened hidden states to construct class-conditional nonconformity scores, enabling statistically valid prediction sets with instance-level explanations. Empirically, CONFIDE improves test accuracy by up to 4.09% on BERT-tiny and achieves greater correct efficiency (i.e., the expected size of the prediction set conditioned on it containing the true label) compared to prior methods, including NM2 and VanillaNN. We show that early and intermediate transformer layers often yield better-calibrated and more semantically meaningful representations for conformal prediction. In resource-constrained models and high-stakes tasks with ambiguous labels, CONFIDE offers robustness and interpretability where softmax-based uncertainty fails. We position CONFIDE as a framework for practical diagnostic and efficiency/robustness improvement over prior conformal baselines.