An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
2026-04-09 • Artificial Intelligence
Artificial IntelligenceComputation and LanguageComputers and SocietyMultiagent Systems
AI summaryⓘ
The authors created a system made up of several AI agents working together to check history textbooks for hidden biases and errors without wrongly blaming facts. They tested this system on Romanian high school textbooks and found it was better at fairly judging content than simpler AI setups. Human reviewers also preferred the system's decisions most of the time. The whole approach is affordable enough to help schools and educators improve textbook quality systematically.
agentic evaluationmultimodal screening agentsource attribution protocolhistorical biastextbook evaluationzero-shot baselinepedagogical acceptabilitymeta-agenthuman-in-the-loopeducational governance
Authors
Gabriel Stefan, Adrian-Marius Dumitran
Abstract
History textbooks often contain implicit biases, nationalist framing, and selective omissions that are difficult to audit at scale. We propose an agentic evaluation architecture comprising a multimodal screening agent, a heterogeneous jury of five evaluative agents, and a meta-agent for verdict synthesis and human escalation. A central contribution is a Source Attribution Protocol that distinguishes textbook narrative from quoted historical sources, preventing the misattribution that causes systematic false positives in single-model evaluators. In an empirical study on Romanian upper-secondary history textbooks, 83.3\% of 270 screened excerpts were classified as pedagogically acceptable (mean severity 2.9/7), versus 5.4/7 under a zero-shot baseline, demonstrating that agentic deliberation mitigates over-penalization. In a blind human evaluation (18 evaluators, 54 comparisons), the Independent Deliberation configuration was preferred in 64.8\% of cases over both a heuristic variant and the zero-shot baseline. At approximately \$2 per textbook, these results position agentic evaluation architectures as economically viable decision-support tools for educational governance.