Improved Algorithms for Clustering with Noisy Distance Oracles
2026-02-20 • Data Structures and Algorithms
Data Structures and Algorithms
AI summaryⓘ
The authors improve on previous work by Bateni et al. that looked at clustering with limited distance information using a weak and strong oracle model. They adapt the popular k-means++ algorithm so it uses fewer expensive strong-oracle queries, while still giving good cluster approximations. For the k-center problem, they create a new method that provides better accuracy with fewer strong queries than the earlier approach. They also test their methods on real data and find their algorithms work better in practice than those by Bateni et al.
k-means clusteringk-means++k-center problemdistance oracleweak-strong oracle modelapproximation algorithmsstrong oracle queriesball-carvingclustering approximation
Authors
Pinki Pradhan, Anup Bhattacharya, Ragesh Jaiswal
Abstract
Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model, the authors design approximation algorithms for $k$-means and $k$-center clustering problems. In this work, we design algorithms with improved guarantees for $k$-means and $k$-center clustering problems in the weak-strong oracle model. The $k$-means++ algorithm is routinely used to solve $k$-means in settings where complete distance information is available. One of the main contributions of this work is to show that $k$-means++ algorithm can be adapted to work in the weak-strong oracle model using only a small number of strong-oracle queries, which is the critical resource in this model. In particular, our $k$-means++ based algorithm gives a constant approximation for $k$-means and uses $O(k^2 \log^2{n})$ strong-oracle queries. This improves on the algorithm of Bateni et al. that uses $O(k^2 \log^4n \log^2 \log n)$ strong-oracle queries for a constant factor approximation of $k$-means. For the $k$-center problem, we give a simple ball-carving based $6(1 + ε)$-approximation algorithm that uses $O(k^3 \log^2{n} \log{\frac{\log{n}}ε})$ strong-oracle queries. This is an improvement over the $14(1 + ε)$-approximation algorithm of Bateni et al. that uses $O(k^2 \log^4{n} \log^2{\frac{\log{n}}ε})$ strong-oracle queries. To show the effectiveness of our algorithms, we perform empirical evaluations on real-world datasets and show that our algorithms significantly outperform the algorithms of Bateni et al.