Active Query Synthesis for Preference Learning
2026-05-25 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address the challenge of efficiently learning user preferences without needing lots of labeled data. They note that some questions asked to users can give unreliable answers, especially when items are very similar or very different. To solve this, they create a model that understands when responses might be uncertain. They also develop a method called Info-Synth that actively generates the best questions to ask, making the process faster and smarter. Finally, they test their approach on various tasks including learning preferences and tuning robot controls.
active learninguser preferencesquery synthesismutual informationconfidence modelingpairwise comparisonspool-based evaluationpreference learningrobot control tuning
Authors
Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport
Abstract
Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.