Prompt Programming for Cultural Bias and Alignment of Large Language Models

2026-03-17Artificial Intelligence

Artificial IntelligenceComputation and Language
AI summary

The authors studied how large language models (LLMs) can be biased toward certain cultures, which can cause problems when these models are used to help make decisions or analyze information. They tested an existing method that uses surveys to measure cultural alignment on open-source LLMs, confirming that cultural biases still exist. Then, they introduced a new way to improve cultural alignment by treating prompts as programmable pieces that can be optimized automatically. Their experiments showed that this new approach often works better than manually designing prompts, offering a more consistent way to make LLM responses fit different cultural values.

large language modelscultural alignmentprompt engineeringsurvey-based metricsopen-weight modelscultural biasprompt programmingDSPyoptimizationvalue alignment
Authors
Maksim Eren, Eric Michalak, Brian Cook, Johnny Seales
Abstract
Culture shapes reasoning, values, prioritization, and strategic decision-making, yet large language models (LLMs) often exhibit cultural biases that misalign with target populations. As LLMs are increasingly used for strategic decision-making, policy support, and document engineering tasks such as summarization, categorization, and compliance-oriented auditing, improving cultural alignment is important for ensuring that downstream analyses and recommendations reflect target-population value profiles rather than default model priors. Previous work introduced a survey-grounded cultural alignment framework and showed that culture-specific prompting can reduce misalignment, but it primarily evaluated proprietary models and relied on manual prompt engineering. In this paper, we validate and extend that framework by reproducing its social sciences survey based projection and distance metrics on open-weight LLMs, testing whether the same cultural skew and benefits of culture conditioning persist outside closed LLM systems. Building on this foundation, we introduce use of prompt programming with DSPy for this problem-treating prompts as modular, optimizable programs-to systematically tune cultural conditioning by optimizing against cultural-distance objectives. In our experiments, we show that prompt optimization often improves upon cultural prompt engineering, suggesting prompt compilation with DSPy can provide a more stable and transferable route to culturally aligned LLM responses.