Human Adults and LLMs as Scientists: Who Benefits from Active Exploration?

2026-06-04Computation and Language

Computation and Language
AI summary

The authors studied how adults learn about causes when multiple things need to happen together (conjunctive rules) versus when just one condition is enough (disjunctive rules). They found that when adults can actively test and explore clues themselves, they get better at figuring out the combined causes, though it still takes more effort than simple causes. They also tested language AI models and found some could guess rules like humans but were not as good at exploring efficiently. This shows that giving people control over learning helps with understanding complex cause combinations.

causal learningconjunctive rulesdisjunctive rulesactive explorationblicket detectorcausal reasoninglarge language modelshypothesis inferenceinterventionevidence generation
Authors
Mandana Samiei, Eunice Yiu, Anthony GX-Chen, Dongyan Lin, Jocelyn Shen, Blake A. Richards, Alison Gopnik, Doina Precup
Abstract
A long-standing finding in the causal learning literature is that adults struggle to identify conjunctive causal rules, where an effect requires the simultaneous presence of multiple causes, while performing better in disjunctive settings. However, most demonstrations of this ``conjunctive handicap'' rely on passive observation paradigms with limited evidence, where learners have no control over evidence generation. This paper asks whether this bias persists when adults are granted agency through active exploration. Using a modified ``blicket detector'' task, adult participants freely intervened to identify causal objects under conjunctive or disjunctive rule structures. We show that active exploration substantially improves adults' conjunctive causal reasoning, although conjunctive rules still require more tests to infer than disjunctive rules. We further compare human performance to a range of large language models in the same setting. While some state-of-the-art models approach human-level performance on hypothesis inference accuracy, they often exhibit less efficient exploration strategies and similar conjunctive-disjunctive performance gaps.