From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests

2026-04-03Software Engineering

Software Engineering
AI summary

The authors studied how automated code reviewers, called code review agents (CRAs), affect whether code changes get accepted or abandoned. They found that code reviewed only by CRAs was accepted less often (45%) compared to reviews done only by humans (68%), and more CRAs' feedback contained a lot of noise, meaning it was less useful. This suggests that automated reviews alone often produce unclear or low-quality feedback, which can lead to more abandoned code changes. The authors recommend that CRAs be used to help humans rather than replace them, since human involvement is important for good code review.

Autonomous coding agentsPull requestsCode review agents (CRA)Code reviewMerge rateSignal-to-noise ratioAbandoned pull requestsHuman oversightAutomated feedbackSoftware development workflows
Authors
Kowshik Chowdhury, Dipayan Banik, K M Ferdous, Shazibul Islam Shamim
Abstract
Autonomous coding agents are generating code at an unprecedented scale, with OpenAI Codex alone creating over 400,000 pull requests (PRs) in two months. As agentic PR volumes increase, code review agents (CRAs) have become routine gatekeepers in development workflows. Industry reports claim that CRAs can manage 80% of PRs in open source repositories without human involvement. As a result, understanding the effectiveness of CRA reviews is crucial for maintaining developmental workflows and preventing wasted effort on abandoned pull requests. However, empirical evidence on how CRA feedback quality affects PR outcomes remains limited. The goal of this paper is to help researchers and practitioners understand when and how CRAs influence PR merge success by empirically analyzing reviewer composition and the signal quality of CRA-generated comments. From AIDev's 19,450 PRs, we analyze 3,109 unique PRs in the commented review state, comparing human-only versus CRA-only reviews. We examine 98 closed CRA-only PRs to assess whether low signal-to-noise ratios contribute to abandonment. CRA-only PRs achieve a 45.20% merge rate, 23.17 percentage points lower than human-only PRs (68.37%), with significantly higher abandonment. Our signal-to-noise analysis reveals that 60.2% of closed CRA-only PRs fall into the 0-30% signal range, and 12 of 13 CRAs exhibit average signal ratios below 60%, indicating substantial noise in automated review feedback. These findings suggest that CRAs without human oversight often generate low-signal feedback associated with higher abandonment. For practitioners, our results indicate that CRAs should augment rather than replace human reviewers and that human involvement remains critical for effective and actionable code review.