Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching

2026-05-28Machine Learning

Machine Learning
AI summary

The authors propose HullFT, a new way to make test-time finetuning (TTFT) faster and better. TTFT tweaks a language model on-the-fly by picking helpful examples, but this can be slow. HullFT uses a geometric method to select a small, relevant, and diverse set of examples efficiently, then cleverly repeats some to speed up training steps. Their experiments show HullFT improves performance while reducing the total time compared to older methods.

test-time finetuninglanguage modelquery embeddingFrank-Wolfe optimizationconvex combinationdiversity-aware selectiongradient reusebits-per-byteintegerization
Authors
Alaa Khamis, Alaa Maalouf
Abstract
Test-time finetuning (TTFT) is a rapidly evolving paradigm that adapts a language model to each prompt by retrieving related sequences, updating the model on them, and then evaluating the prompt. However, TTFT is only practical if it is fast: selection and finetuning both happen per query, making each a direct bottleneck. Existing methods trade speed for quality: fast retrieval is often redundant, while stronger diversity-aware selection adds prohibitive per-query cost. We introduce HullFT, a geometric approach to TTFT that addresses both bottlenecks. Given a query, HullFT first represents the query embedding as a sparse convex combination of few training sequences, using efficient projection-free Frank-Wolfe optimization. This yields a support set that is inherently relevant and diverse. We then convert the fractional convex weights into an exact integer multiset for finetuning through a geometric integerization procedure. The resulting multiplicities naturally create repeated examples, which we exploit with Gradient Reuse to amortize forward-backward computation across repeated finetuning steps. Our experiments show that HullFT improves the quality-efficiency tradeoff over current state-of-the-art TTFT methods, achieving lower bits-per-byte at substantially lower total runtime.