Greater accessibility can amplify discrimination in generative AI

2026-03-23Computation and Language

Computation and Language
AI summary

The authors studied how voice-enabled large language models (LLMs) treat users differently based on their voice. They found that these models show gender bias by associating certain words and jobs with a speaker's voice, even more than text-only models do. This means that using voice can make AI less fair, especially since many users rely on voice for accessibility. They also found that changing the pitch of the voice can help reduce this bias. Their work highlights the challenge of making AI both accessible and fair at the same time.

large language modelsvoice interactiongender biasaccessibilityparalinguistic cuespitch manipulationsocial biasfairness in AIdiscriminationuser experience
Authors
Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou, Valentin Hofmann, Katharina von der Wense, Anne Lauscher
Abstract
Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their training data. Moreover, text-based interfaces remain a barrier for many, for example, users with limited literacy, motor impairments, or mobile-only devices. Voice interaction promises to expand accessibility, but unlike text, speech carries identity cues that users cannot easily mask, raising concerns about whether accessibility gains may come at the cost of equitable treatment. Here we show that audio-enabled LLMs exhibit systematic gender discrimination, shifting responses toward gender-stereotyped adjectives and occupations solely on the basis of speaker voice, and amplifying bias beyond that observed in text-based interaction. Thus, voice interfaces do not merely extend text models to a new modality but introduce distinct bias mechanisms tied to paralinguistic cues. Complementary survey evidence ($n=1,000$) shows that infrequent chatbot users are most hesitant to undisclosed attribute inference and most likely to disengage when such practices are revealed. To demonstrate a potential mitigation strategy, we show that pitch manipulation can systematically regulate gender-discriminatory outputs. Overall, our findings reveal a critical tension in AI development: efforts to expand accessibility through voice interfaces simultaneously create new pathways for discrimination, demanding that fairness and accessibility be addressed in tandem.