AI summaryⓘ
The authors examine how the error surface (loss landscape) behaves for simple two-layer ReLU neural networks when learning from Gaussian data. They find that local minimum points of this surface can be described using a small number of key summary statistics, which helps to better understand the shape of the landscape. They also connect these minimum points to stable solutions found by a common training method called one-pass SGD. Their work shows that as the network gets wider, these minima become connected by flat regions, making it easier for the training to find the best solutions without getting stuck. The authors highlight that common simplifications may overlook important aspects even in simple network models.
ReLU networkspopulation loss landscapeteacher-student settingGaussian covariateslocal minimasummary statisticsstochastic gradient descent (SGD)overparameterizationloss surfaceneural network training dynamics
Authors
Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli
Abstract
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.