Why can genetic algorithms work in high-dimensional search spaces?

2026-06-29Neural and Evolutionary Computing

Neural and Evolutionary Computing
AI summary

The authors explain that a simple type of genetic algorithm behaves like a special kind of gradient descent when mutations are small, moving in directions guided by the loss function without explicitly calculating gradients. This process is noisy, which makes it slower than exact gradient descent, but the slowdown depends on a property of the loss called the Hessian's effective rank, not directly on the total number of parameters. Since neural networks often have a Hessian with a low effective rank, this might be why these genetic algorithms can work well even for very large problems. Essentially, the study connects genetic algorithms' behavior to familiar concepts in optimization.

genetic algorithmgradient descentmutation-selectionloss functionHessiananisotropic Gaussian noiseeffective rankneural networksoptimizationsearch space
Authors
Stephen Whitelam
Abstract
We show that the effective dynamics of the elitist $(1+M)$ genetic algorithm is, in the limit of small mutations, clipped gradient descent on the loss in the presence of anisotropic Gaussian white noise. In expectation, therefore, a simple mutation-selection genetic algorithm follows the gradient of the loss, without explicit calculation of gradients and without averaging over loss evaluations. The genetic algorithm is slower than gradient descent because of the noise that acts in directions transverse to the gradient. However, this slowdown is controlled not by the number of parameters of the search space but by the effective rank of the Hessian of the loss function. For the concentrated Hessian spectra observed in neural-network loss functions the effective rank can be far smaller than the number of parameters, which may explain why genetic algorithms can scale to large search spaces.