Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms

2026-04-20Human-Computer Interaction

Human-Computer Interaction
AI summary

The authors studied how students write code using either a human partner or GitHub Copilot, an AI tool, to see which helps more under time pressure. They found that students performed better and felt less workload with Copilot, but working with a human teammate made them feel more positive and engaged. When tested again a week later, students who used AI showed slightly worse performance, although the difference was not significant. The authors suggest educators should keep using pair programming alongside AI rather than replacing humans with AI completely.

code generationpair programmingGitHub Copilotworkloadself-efficacyprogramming educationretention testperformance measurementemotional responsehuman-AI collaboration
Authors
Nicholas Gardella, James Prather, Juho Leinonen, Paul Denny, Raymond Pettit, Sara L. Riggs
Abstract
Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years. While research and pedagogy are beginning to cope with this change, computing students are left to bear the unforeseen consequences of AI amidst a dearth of empirical evidence about its effects. Though pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement, it remains underutilized and further threatened by the proposition that AI can replace a human programming partner. In this paper, we present a controlled pair programming study with 22 participants who wrote Python code under time pressure in teams of two and individually with GitHub Copilot for 20 minutes each. They were incentivized by bonus compensation to balance performance with understanding and were retested individually on the programming tasks after a retention interval of one week. Subjective measures of workload and emotion as well as objective measures of performance and learning (retest performance) were collected. Results showed that participants performed significantly better with GitHub Copilot than their human teammate, and several dimensions of their workload were significantly reduced. However, the emotional effect of the human teammate was significantly more positive and arousing as compared to working with Copilot. Furthermore, there was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition. We recommend that educators strongly consider revisiting pair programming as an educational tool in addition to embracing modern AI.