Computational conceptual history of scientific concepts: From early digital methods to LLMs

2026-06-02Computation and Language

Computation and Language
AI summary

The authors explore how large language models (LLMs) fit into the history of using computers to study concepts in fields like history, philosophy, and sociology of science. They first explain older methods for analyzing how words and ideas change over time using digital tools. Then, they discuss what LLMs bring to the table, what problems they share with past methods, and look at recent examples of their use. Throughout, they focus on challenges like choosing the right text collections, how models are built and trained, and how results are interpreted.

large language modelsconcept analysislexical semantic changecomputational historycorpus constructionoperationalizationmodel trainingevaluationhistory of sciencedigital humanities
Authors
Michael Zichert, Arno Simons
Abstract
This article situates large language models (LLMs) within the longer history of computational approaches to concept analysis in the history, philosophy, and sociology of science (HPSS). We examine what LLMs add to existing methods, how they inherit longstanding problems, and review recent case studies that employ them. In the first part, we reconstruct computational conceptual history before LLMs by bringing together three strands of work: early digital methods in HPSS, distributional approaches from digital history and related research, and lexical semantic change detection. We provide an overview of the main challenges and opportunities, focusing on corpus construction, operationalization and modelling choices, and evaluation and interpretation. In the second part, we turn to the era of LLMs, starting with a short introduction to LLMs before reviewing LLM-based work on lexical semantic change detection and relevant case studies in HPSS. We then revisit the earlier methodological questions, showing how issues of corpus construction, model choice and training data, operationalization trade-offs, and evaluation and interpretation play out in LLM-based workflows.