Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning
2026-06-11 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors identify that simply finding similar examples by meaning isn't enough for complex reasoning because similar-looking problems can require very different approaches. They propose a new method called RA-RFT, which trains models to pick examples based on how helpful they are for reasoning, not just similarity. Their approach uses reinforcement learning to improve how the model uses these examples to solve problems. They show that this method works better on hard math tests by finding different but useful problem-solving strategies.
Retrieval-Augmented GenerationReinforcement LearningReasoning by AnalogyGold-Relevance DistillationLexical SimilaritySemantic SimilarityMathematical ReasoningContext RetrievalFine-TuningAIME Benchmark
Authors
Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez
Abstract
Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively -- suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.