Scaffolding Human-AI Collaboration: A Field Experiment on Behavioral Protocols and Cognitive Reframing
2026-04-09 • Human-Computer Interaction
Human-Computer Interaction
AI summaryⓘ
The authors studied how giving people different types of help when using an AI writing tool affected their work. They found that making pairs follow a strict process with the AI led to worse and fewer documents. But training people to think of the AI as a partner improved the quality of the best documents. People's positive views about AI increased, but this may not have been due to the training alone. The study had some limitations like timing issues and differences in how documents were graded.
Generative AIHuman-AI collaborationField experimentBehavioral scaffoldingCognitive scaffoldingDocument qualityPartnership trainingAI productivityLLM gradingAttrition
Authors
Alex Farach, Alexia Cambon, Lev Tankelevitch, Connie Hsueh, Rebecca Janssen
Abstract
Organizations have widely deployed generative AI tools, yet productivity gains remain uneven, suggesting that how people use AI matters as much as whether they have access. We conducted a field experiment with 388 employees at a Fortune 500 retailer to test two scaffolding interventions for human-AI collaboration. All participants had access to the same AI tool; we varied only the structure surrounding its use. A behavioral scaffolding intervention (a structured protocol requiring joint AI use within pairs) was associated with lower document quality relative to unstructured use and substantially lower document production. A cognitive scaffolding intervention (partnership training that reframed AI as a thought partner) was associated with higher individual document quality at the top of the distribution. Treatment participants also showed greater positive belief change across the session, though sensitivity analyses suggest this likely reflects recovery from carry-over effects rather than genuine training-induced shifts. Both findings are subject to design limitations including an AM/PM session confound, differential attrition, and LLM grading sensitivity to document length.