AI summaryⓘ
The authors study how to detect a small hidden group inside a large random network where connections inside this group follow a special geometric pattern on a high-dimensional sphere, while the rest of the network looks random. They show that the usual way of checking individual edges doesn't work since edge probabilities are the same everywhere; instead, the detection relies on more subtle patterns in how edges depend on each other. They provide mathematical tests and boundaries to understand when detection is possible and when it is not, identifying a precise threshold related to the network size and geometry dimension. Additionally, they explore the gap between what is theoretically possible and what can be efficiently computed, providing evidence that some natural algorithms fall short.
random geometric graphErdős–Rényi modelcommunity detectioninformation-theoretic limitscomputational-statistical gapsigned triangle countslatent vectorsWishart-GOE comparisonKL divergencelow-degree polynomials
Abstract
We study the problem of detecting local geometry in random graphs. We introduce a model $\mathcal{G}(n, p, d, k)$, where a hidden community of average size $k$ has edges drawn as a random geometric graph on $\mathbb{S}^{d-1}$, while all remaining edges follow the Erdős--Rényi model $\mathcal{G}(n, p)$. The random geometric graph is generated by thresholding inner products of latent vectors on $\mathbb{S}^{d-1}$, with each edge having marginal probability equal to $p$. This implies that $\mathcal{G}(n, p, d, k)$ and $\mathcal{G}(n, p)$ are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at $d = \widetildeΘ(k^2 \vee k^6/n^3)$ for fixed $p$, and extend the state-of-the-art bounds from the full model (i.e., $k = n$) for vanishing $p$. On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length $\ell \geq 4$.