CREATE: Testing LLMs for Associative Creativity
2026-03-10 • Computation and Language
Computation and Language
AI summaryⓘ
The authors developed CREATE, a test to see how well AI models can come up with creative connections between ideas. Models have to make many unique and meaningful links between concepts, and they get better scores for producing a wide variety of these links. The task is hard because there's a huge number of possible connections, similar to real creative thinking like coming up with new hypotheses. The authors found that even the best models struggle to fully master the task, and some newer techniques only help a little. CREATE is meant to help researchers build better AI that can think more creatively.
associative reasoningcreative reasoningAI benchmarkconcept connectionsdiversity in outputsspecificityhypothesis generationmodel evaluationprompting techniquessearch space
Authors
Manya Wadhwa, Tiasa Singha Roy, Harvey Lederman, Junyi Jessy Li, Greg Durrett
Abstract
A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.