Towards Automating Scientific Review with Google's Paper Assistant Tool

2026-06-26 • Machine Learning

Machine LearningArtificial IntelligenceComputation and LanguageComputers and Society

AI summaryⓘ

The authors explain that AI is speeding up scientific research, but it also creates a problem because human reviewers can't keep up with checking all the new work. To help, they suggest using AI to assist with reviewing papers as well. They created a tool called PAT that reads full scientific papers and checks for mistakes, suggests ways to improve, and validates results. Early tests show PAT finds more errors than simpler AI methods and helps researchers fix problems before formal review, making the review process easier for human reviewers.

artificial intelligencepeer reviewscientific verificationagentic AIinference scalingmathematical error detectionSPOT benchmarkSTOC conferenceICML conferenceresearch paper evaluation

Authors

Rajesh Jayaram, Drew Tyler, David Woodruff, Corinna Cortes, Yossi Matias, Vahab Mirrokni, Vincent Cohen-Addad

Abstract

Artificial intelligence is driving a revolution in scientific discovery, accelerating everything from hypothesis generation to mathematical theorem proving. However, this rapid acceleration is creating a systemic challenge: traditional human peer review cannot scale to match the influx of AI-assisted science. Ultimately, to resolve this tension, we must also deploy AI to accelerate the verification and review process itself. To frame the discussion around this transition, we propose a taxonomy consisting of four progressive levels of AI-human collaboration in scientific evaluation, and discuss various trade-offs involved with each. As a step toward this future, we introduce the Paper Assistant Tool (PAT), an agentic AI framework built for deep scientific review and verification. PAT ingests full scientific manuscripts and produces a comprehensive evaluation, checking theoretical results, validating experiments, suggesting improvements, and identifying potential flaws. By utilizing inference scaling techniques, PAT is able to identify deeper issues than a single model call alone, achieving a 34% improvement over zero-shot recall on mathematical errors in the SPOT benchmark. Pilot deployments of PAT as a pre-submission tool for authors at two major Computer Science conferences -- STOC and ICML -- demonstrate its ability to identify critical errors and suggest substantive improvements to research papers. By catching errors early, PAT eases the cognitive burden placed on referees, while preserving their control over the outcomes of the review process.

View PDFOpen arXiv