AI summaryⓘ
The authors studied a method called Automated Instruction Revision (AIR) that helps large language models adjust to new tasks using a few examples by creating simple rule-based instructions. They compared AIR with other methods like prompt tweaking, retrieval techniques, and fine-tuning across different types of tasks. Their results showed that the best method depends on the task: AIR worked well for tasks involving changing labels, retrieval was best for question answering, and fine-tuning excelled at extracting structured information and reasoning about event order. The authors suggest AIR is useful when tasks can be explained with clear rules, while other methods are better for tasks needing specific knowledge or patterns from the dataset.
Automated Instruction Revisionlarge language modelsprompt optimizationretrieval-based methodsfine-tuninglabel remappingstructured extractionlogical reasoningtask adaptation
Authors
Solomiia Bilyk, Volodymyr Getmanskyi, Taras Firman
Abstract
This paper studies Automated Instruction Revision (AIR), a rule-induction-based method for adapting large language models (LLMs) to downstream tasks using limited task-specific examples. We position AIR within the broader landscape of adaptation strategies, including prompt optimization, retrieval-based methods, and fine-tuning. We then compare these approaches across a diverse benchmark suite designed to stress different task requirements, such as knowledge injection, structured extraction, label remapping, and logical reasoning. The paper argues that adaptation performance is strongly task-dependent: no single method dominates across all settings. Across five benchmarks, AIR was strongest or near-best on label-remapping classification, while KNN retrieval performed best on closed-book QA, and fine-tuning dominated structured extraction and event-order reasoning. AIR is most promising when task behavior can be captured by compact, interpretable instruction rules, while retrieval and fine-tuning remain stronger in tasks dominated by source-specific knowledge or dataset-specific annotation regularities.