Robust Regression of General ReLUs with Queries

2026-06-09Machine Learning

Machine Learning
AI summary

The authors study how to learn general ReLU functions (which are simple models used in machine learning) when the data follows a Gaussian distribution. They improve previous results by using an interactive approach where the learner can query labels, resulting in fewer labeled examples needed while still achieving near-best possible accuracy. They also prove that their method's number of label queries is almost the best possible and show that having query access is necessary to reduce label use compared to passive learning. Additionally, they explain that in active learning, one needs many labels unless an impractically large set of unlabeled data is available.

ReLUagnostic learningGaussian distributioninteractive learninglabel queriespassive learningactive learningsquared lossquery complexitymachine learning
Authors
Ilias Diakonikolas, Daniel M. Kane, Mingchen Ma
Abstract
We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses $poly(d,1/ε)$ labeled examples and outputs a hypothesis with error $O(opt)+ε$, where $opt$ is the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses $d polylog(1/ε)+\tilde{O}(\min\{1/p, 1/ε\})$ black-box label queries, where $p$ is the bias of the target function, and achieves error $O(opt)+ε$. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires $\tildeΩ(d/ε)$ labels, unless it draws a super-polynomial number of unlabeled examples.