CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space
2026-04-10 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors point out that current tests for large language models (LLMs) making decisions use simple assumptions that don't match real-world complexities. These tests usually pick actions from a small set and ignore rules that might limit what actions are possible. To fix this, the authors created CONDESION-BENCH, a new test where actions are made up of parts and must follow specific rules. Their benchmark checks both how good the decisions are and if they follow these rules, offering a more detailed way to see how well LLMs can help with decisions.
large language modelsdecision-making benchmarkscompositional actionsconditional decision-makingoracle evaluationdecision-support toolsaction feasibilitycontextual understanding
Authors
Yeonjun Hwang, Sungyong Park, Minju Kim, Dongha Lee, Jinyoung Yeo
Abstract
Large language models have been widely explored as decision-support tools in high-stakes domains due to their contextual understanding and reasoning capabilities. However, existing decision-making benchmarks rely on two simplifying assumptions: actions are selected from a finite set of pre-defined candidates, and explicit conditions restricting action feasibility are not incorporated into the decision-making process. These assumptions fail to capture the compositional structure of real-world actions and the explicit conditions that constrain their validity. To address these limitations, we introduce CONDESION-BENCH, a benchmark designed to evaluate conditional decision-making in compositional action space. In CONDESION-BENCH, actions are defined as allocations to decision variables and are restricted by explicit conditions at the variable, contextual, and allocation levels. By employing oracle-based evaluation of both decision quality and condition adherence, we provide a more rigorous assessment of LLMs as decision-support tools.