Hold-One-Shot-Out (HOSO) for Validation-Free Few-Shot CLIP Adapters

2026-03-04 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors found a new way to improve a method called CLIP adaptation, which helps computers recognize images with only a few examples. Usually, you need to choose a balance between old knowledge and new data using extra test information, but their method, called HOSO-Adapter, learns this balance without needing extra validation data. It uses one example left out from the training set to figure out the best blend, making it truly few-shot. This approach consistently worked better than previous methods across many tests, even beating methods that picked the balance using test data.

CLIPfew-shot learningadaptationblending ratiovalidation-freeone-shot learningsupport examplesablation studyCLIP-Adapter

Authors

Chris Vorster, Mayug Maniparambil, Noel E. O'Connor, Noel Murphy, Derek Molloy

Abstract

In many CLIP adaptation methods, a blending ratio hyperparameter controls the trade-off between general pretrained CLIP knowledge and the limited, dataset-specific supervision from the few-shot cases. Most few-shot CLIP adaptation techniques report results by ablation of the blending ratio on the test set or require additional validation sets to select the blending ratio per dataset, and thus are not strictly few-shot. We present a simple, validation-free method for learning the blending ratio in CLIP adaptation. Hold-One-Shot-Out (HOSO) presents a novel approach for CLIP-Adapter-style methods to compete in the newly established validation-free setting. CLIP-Adapter with HOSO (HOSO-Adapter) learns the blending ratio using a one-shot, hold-out set, while the adapter trains on the remaining few-shot support examples. Under the validation-free few-shot protocol, HOSO-Adapter outperforms the CLIP-Adapter baseline by more than 4 percentage points on average across 11 standard few-shot datasets. Interestingly, in the 8- and 16-shot settings, HOSO-Adapter outperforms CLIP-Adapter even with the optimal blending ratio selected on the test set. Ablation studies validate the use of a one-shot hold-out mechanism, decoupled training, and improvements over the naively learnt blending ratio baseline. Code is released here: https://github.com/chris-vorster/HOSO-Adapter

View PDFOpen arXiv