Measuring the Gap Between Human and LLM Research Ideas

2026-07-01 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors studied how ideas generated by large language models (LLMs) compare to those created by human researchers. They developed a method using real research papers and their inspiring prior works to prompt LLMs to create related new ideas. By analyzing these ideas with a special system that categorizes their style and focus, the authors found that LLMs tend to generate ideas in a narrower and different pattern than humans. Specifically, LLMs often focus on combining existing concepts, while human ideas cover a wider variety of ways to frame research problems. This shows that LLMs can come up with reasonable ideas, but their creative range is still limited compared to humans.

Large Language ModelsResearch IdeationPrior WorkIdea GenerationResearch ParadigmOpportunity PatternNoveltyFeasibilityTaxonomyDistributional Gap

Authors

Ziyu Chen, Yilun Zhao, Arman Cohan

Abstract

LLMs are increasingly used to brainstorm research ideas, but existing evaluations mostly judge individual ideas by novelty, feasibility, or expert preference. We instead ask: how far are current LLM-generated ideas from human researchers? To characterize this gap, we build a large-scale evaluation framework for ideation from high-quality human research papers. For each paper, we reverse-engineer a small set of closely related prior works that likely inspired its core idea. LLMs are then prompted to generate a new idea from the set of paper titles and summaries. We introduce a two-axis research-taste taxonomy to profile each idea by its opportunity pattern and research paradigm, and use it to quantify the divergence between human and LLM ideas. Across idea sets generated by different LLMs, we observe a consistent distributional gap: LLM ideas are disproportionately concentrated around bridge-like opportunities and synthesis methods, whereas the human paper reference distribution spreads more broadly across ways of framing gaps and constructing contributions. This result suggests that strong LLMs can produce a range of reasonable ideas, but that range remains narrower than, and systematically shifted relative to, human research taste.

View PDFOpen arXiv