Uncertainty-Aware Generation and Decision-Making Under Ambiguity
2026-06-29 • Computation and Language
Computation and LanguageMachine Learning
AI summaryⓘ
The authors explore how to make decisions more reliably when using Large Language Models (LLMs) for tasks that need careful judgment, like tutoring or peer reviewing. They focus on ways to handle uncertainty by using ideas from decision theory, particularly Bayesian and risk-averse methods. Their experiments show that some methods work better than others, especially when the situation is unclear. The authors also point out that risk-averse approaches might sometimes lead to less useful, generic answers, while Bayesian methods usually do better. They highlight challenges for improving decision-making with LLMs in the future.
Large Language ModelsBayesian decision theoryrisk-averse decision makinguncertainty quantificationconformal predictiontutoring systemspeer reviewingdecision algorithmssubjectivity in AImodel trustworthiness
Authors
Nico Daheim, Iryna Gurevych
Abstract
With rapidly improving capabilities, Large Language Models (LLMs) are increasingly used in many complex real-world tasks. Beyond requiring in-depth knowledge and reasoning skills, many of these tasks exhibit a high degree of subjectivity and require that the outputs of the model can be trusted. While a lot of progress has been made to train better models, decision-making algorithms have received less attention. In this work, we present and evaluate various uncertainty-aware decision-making algorithms based on Bayesian decision theory and risk-averse decision making on the tasks of tutoring and automatic peer reviewing. Concretely, we take uncertainty over tutoring strategies and review scores into account when generating a tutor response or review and use conformal prediction to provide guarantees over strategy and score. We find empirically that these algorithms can improve the utility of the generations but need to be carefully implemented when ambiguity is high. For example, risk-averse rules can degrade performance by optimizing for generic outputs, while Bayesian methods tend to perform better. Our work uses techniques from decision theory to improve LLM-based decision-making and outlines open challenges for the community.