Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features

2026-06-02Computation and Language

Computation and LanguageMachine Learning
AI summary

The authors studied how well fake news detectors work when the AI that creates fake articles is given different instructions (prompts). They used three groups of AI-written articles with different prompts and compared them to real news. By looking at simple language features like vocabulary variety, how easy the text is to read, and emotion levels, they trained a model to spot AI-generated fake news. Their model stayed very accurate even when tested on new prompts it hadn't seen before, showing these features are reliable indicators across different AI writing styles.

large language modelsfake news detectionprompting strategieslexical diversityreadabilityemotional intensityrandom forest classifiercross-prompt generalization
Authors
Aya Vera-Jimenez, Samuel Jaeger, Calvin Ibenye, Dhrubajyoti Ghosh
Abstract
The increasing use of large language models has raised concerns about the spread of AI-generated fake news, particularly under varying prompting strategies. Most existing detection models are trained and evaluated under a single generation setting, leaving their ability to generalize across unseen prompts unclear. In this study, we investigate cross-prompt generalization in fake news detection using three datasets of AI-generated articles produced under distinct prompts, combined with real news articles. We extract interpretable linguistic features capturing lexical diversity, readability, and emotion-based characteristics and evaluate a random forest classifier under a cross-prompt framework, where models trained on one prompt are tested on another. Across all six train-test combinations, performance remains consistently high, with AUC values ranging from 0.988 to 1.000. Analysis of feature distributions shows that AI-generated text exhibits increased lexical diversity, reduced readability, and substantially lower emotional intensity compared to the overall dataset, with variations across prompts. Despite these distributional shifts, the classifier maintains strong performance, indicating that these features capture stable properties of AI-generated text that generalize across prompting strategies. These findings suggest that feature-based approaches can provide robust detection of AI-generated fake news under prompt variability.