PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
2026-04-10 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors studied how giving large language models (LLMs) different 'personas'—like characters or roles—affects their behavior. They found that training models with reinforcement learning using verifiable rewards (RLVR) makes the models less sensitive to persona changes, which helps with consistent task performance but can reduce how well the model sticks to a persona when needed. To fix this, they created PerMix-RLVR, a new training method that balances keeping the model stable against unwanted persona shifts and still allowing it to act in character when appropriate. Their approach improved both persona stability and how well the model expresses personas in tests.
Persona promptingLarge Language ModelsReinforcement LearningVerifiable RewardsPersona SensitivityPersona FidelityPersona StabilityRole-playingPromptingModel Robustness
Authors
Jihwan Oh, Soowon Oh, Murad Aghazada, Minchan Jeong, Sungnyun Kim, Se-Young Yun
Abstract
Persona prompting has been widely adopted to steer large language models (LLMs) behavior and improve their instruction performance by assigning specific characters. However, identifying an optimal persona is time-consuming, and its impact on output quality remains poorly understood. Prior work has mainly addressed this issue at the prompt level via inference-time strategies, incurring additional computation. In this work, we avoid inference-time prompt search by tackling persona sensitivity during training, aiming to train models that adapt their behavior to diverse personas while preserving task performance. In particular, we find that reinforcement learning with verifiable rewards (RLVR) systematically reduces sensitivity to persona prompts, but also reveals an inherent trade-off of outcome-based optimization: while RLVR improves robustness on tasks with verifiable goals, it can also degrade persona expressivity when needed, e.g., in-character role-playing. To address this limitation, we propose PerMix-RLVR, a persona-mixed RLVR strategy that mitigates the persona robustness-fidelity trade-off, preserving strong robustness to harmful persona variation while enabling faithful persona adoption when required. Concretely, PerMix-RLVR improves persona stability score (PSS) over RLVR by +21.2% on MATH500, while also enhancing persona fidelity by +11.4% on PersonaGym.