AI Exposure Scores: what they measure, what they miss, and what comes next

2026-06-22Artificial Intelligence

Artificial Intelligence
AI summary

The authors discuss a 2023 study by Eloundou et al. that created scores to estimate how much large language models can assist with different jobs. They explain that while this method is useful, the scores have limits when applied to real-world policy decisions because they don’t change over time or consider local differences. The authors identify two main problems: one about how these scores measure exposure versus what policies actually need, and another about the lack of communication between researchers and policymakers. They suggest improvements like updating methods and involving workers more, but emphasize that better measurement alone can’t solve the gap between research and policy.

exposure scoreslarge language modelsfuture of workpolicy analysismethodological limitationsoccupational tasksresearch-policy gapparticipatory methodsdynamic measuresdata infrastructure
Authors
Campbell Lund, Thomas Euyang, Zanele Munyikwa, Marzieh Fadaee
Abstract
A set of exposure scores calculated in 2023 has become a central empirical input to the future of work debate. Produced by Eloundou et al. (2023) and referred to here as the GPTs are GPTs scores, they define exposure as the share of occupational tasks a large language model can assist with. This work is a genuine methodological contribution, but as the scores travel from the time and place they were produced, the limitations the authors named do not always travel with them. Two gaps have widened as a result. The first is structural, between what static exposure scores measure and what policy questions actually require. Taking the diffusion of these scores as a case study, we show how their temporal, geographic, and ontological limitations compound in policy-facing analyses, and we survey five families of research responding to these limits: dynamic and benchmark-based measures, ensemble methods, task-framework extensions, worker-centered metrics, and adoption and usage data. The second gap is the one we argue needs more attention: the coordination between researchers and policymakers. The policy-relevant work which ask who is harmed, who benefits, how, and when, continues to reference the static GPTs are GPTs scores without engagement with the methodological updates that would let these questions be answered more reliably. We then ask what additional steps towards navigating uncertainty remain: ex-post frameworks and the deliberate, political work of reimagining what futures are worthy of building towards are. Closing the research-policy gap is a shared task: policymakers must widen their evidence base, engage workers as epistemic partners, and shift from prediction to preparedness; researchers must build data infrastructure, adopt participatory methods, and write with policymakers in mind. Better measurement matters, but it will not close the second gap alone.