Contextual Linear Activation Steering of Language Models

2026-04-27Computation and Language

Computation and Language
AI summary

The authors developed a new method called Contextual Linear Activation Steering (CLAS) that improves how large language models adjust their behavior based on the context of the input. Unlike older methods that apply the same adjustment strength to every word, CLAS changes the strength dynamically depending on the specific prompt. They tested CLAS across many tasks and models and found it works better or just as well as other popular approaches when there is only a small amount of labeled data. This makes CLAS a practical and understandable tool for fine-tuning large language models.

large language modelsactivation steeringlinear activationcontextual adaptationprompt engineeringfine-tuningReFTLoRAlabeled datamodel specialization
Authors
Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin
Abstract
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.