Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent
2026-06-02 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how translation quality changes when the purpose or audience of the translation is clearly specified, using large language models (LLMs). They tested many languages, model sizes, and text types, finding that giving explicit instructions helps translations better fit their intended use, especially for informal texts and bigger models. Standard translation quality measures often miss these improvements. They also showed that when clear instructions aren't given, models can create their own instructions from the text context to improve translations. Overall, the authors show that purpose-driven machine translation is possible and measurable but needs better evaluation methods.
Machine TranslationLarge Language ModelsTranslation AdaptationTranslation MetricsFew-shot LearningContextual InstructionsInformal TextsModel SizeHigh-resource LanguagesPurpose-driven Translation
Authors
Raphael Merx, Ekaterina Vylomova, Trevor Cohn
Abstract
Translation quality depends on purpose: the same source text demands different translations depending on audience, tone, and communicative intent. Yet MT models and metrics treat translation as a fixed mapping from source to target. LLMs enable users to explicitly specify purpose alongside source text, yet this capability has not been evaluated at scale. We introduce a systematic evaluation of purpose-driven MT across 50 languages, 5 model sizes and 8 text domains. We find that (1) explicit instructions substantially improve translation adaptedness, with larger gains on informal domains (conversation, social media), for larger model sizes and for higher-resource languages; (2) instructions outperform semantically-matched few-shot examples and paragraph-level context; (3) traditional MT metrics fail to capture adaptation quality, often penalizing adapted translations; (4) when curated instructions are unavailable, models can self-generate them from surrounding document context, closing up to 80% of the adaptedness gap to curated instructions. Our results establish that purpose-adapted MT is a viable and measurable capability of LLMs, while highlighting the need for purpose-aware metrics.