Active Bipartite Ranking with Smooth Posterior Distributions

2026-02-27Machine Learning

Machine Learning
AI summary

The authors study a problem called bipartite ranking, which is about ordering things based on some criteria. They focus on an active learning scenario where the data's behavior changes smoothly, unlike previous work that assumed sudden jumps. They show that a simple method using fixed steps does not work well in this smooth case, so they create a new algorithm called smooth-rank that performs better by approximating the best possible ranking rule. The authors prove mathematically that smooth-rank works reliably within certain error and confidence levels, and they also provide tests showing it beats other methods.

bipartite rankingactive learningROC curveHölder smoothnessPAC (Probably Approximately Correct)discretisationsampling complexitystatistical learning
Authors
James Cheshire, Stephan Clémençon
Abstract
In this article, bipartite ranking, a statistical learning problem involved in many applications and widely studied in the passive context, is approached in a much more general \textit{active setting} than the discrete one previously considered in the literature. While the latter assumes that the conditional distribution is piece wise constant, the framework we develop permits in contrast to deal with continuous conditional distributions, provided that they fulfill a Hölder smoothness constraint. We first show that a naive approach based on discretisation at a uniform level, fixed \textit{a priori} and consisting in applying next the active strategy designed for the discrete setting generally fails. Instead, we propose a novel algorithm, referred to as smooth-rank and designed for the continuous setting, which aims to minimise the distance between the ROC curve of the estimated ranking rule and the optimal one w.r.t. the $\sup$ norm. We show that, for a fixed confidence level $ε>0$ and probability $δ\in (0,1)$, smooth-rank is PAC$(ε,δ)$. In addition, we provide a problem dependent upper bound on the expected sampling time of smooth-rank and establish a problem dependent lower bound on the expected sampling time of any PAC$(ε,δ)$ algorithm. Beyond the theoretical analysis carried out, numerical results are presented, providing solid empirical evidence of the performance of the algorithm proposed, which compares favorably with alternative approaches.