When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models

2026-02-25 • Human-Computer Interaction

Human-Computer InteractionArtificial IntelligenceComputation and Language

AI summaryⓘ

The authors studied how large language models (LLMs) often remove unique cultural language traits from non-native English speakers when generating text, a process they call 'Cultural Ghosting.' They measured this effect using new scores showing that while the meaning of the text stays mostly the same, important cultural markers, especially polite expressions, are frequently erased. The authors also found that adding specific instructions to preserve culture can reduce this erasure without losing meaning. Their work highlights a trade-off between keeping cultural identity and maintaining clear communication in AI-generated texts.

Large Language ModelsCultural GhostingIdentity Erasure RateSemantic Preservation ScoreNon-native EnglishPragmatic MarkersLexical MarkersPrompt EngineeringPoliteness ConventionsText Processing

Authors

Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang

Abstract

Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce "Cultural Ghosting", the systematic erasure of linguistic markers unique to non-native English varieties during text processing. Through analysis of 22,350 LLM outputs generated from 1,490 culturally marked texts (Indian, Singaporean,& Nigerian English) processed by five models under three prompt conditions, we quantify this phenomenon using two novel metrics: Identity Erasure Rate (IER) & Semantic Preservation Score (SPS). Across all prompts, we find an overall IER of 10.26%, with model-level variation from 3.5% to 20.5% (5.9x range). Crucially, we identify a Semantic Preservation Paradox: models maintain high semantic similarity (mean SPS = 0.748) while systematically erasing cultural markers. Pragmatic markers (politeness conventions) are 1.9x more vulnerable than lexical markers (71.5% vs. 37.1% erasure). Our experiments demonstrate that explicit cultural-preservation prompts reduce erasure by 29% without sacrificing semantic quality.

View PDFOpen arXiv