Sparsity and Out-of-Distribution Generalization
2026-03-08 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors address the problem of how machines can make accurate predictions on new data that looks different from the training data, known as out-of-distribution (OOD) generalization. They suggest that learning should focus on a few important features rather than all available information, guided by a principle called Occam's Razor which favors simpler explanations. Their main result is a theorem showing that if training and test data overlap enough on these key features, the learned model will generalize well, even if other features change a lot. They also extend their approach to cases where the important features lie in a low-dimensional subspace.
Out-of-Distribution GeneralizationGoodman’s Grue PuzzleOccam's RazorSparse HypothesesSample ComplexityBlumer et al. BoundFeature SelectionMachine LearningClassifierSubspace Juntas
Authors
Scott Aaronson, Lin Lin Lee, Jiawei Li
Abstract
Explaining out-of-distribution generalization has been a central problem in epistemology since Goodman's "grue" puzzle in 1946. Today it's a central problem in machine learning, including AI alignment. Here we propose a principled account of OOD generalization with three main ingredients. First, the world is always presented to experience not as an amorphous mass, but via distinguished features (for example, visual and auditory channels). Second, Occam's Razor favors hypotheses that are "sparse," meaning that they depend on as few features as possible. Third, sparse hypotheses will generalize from a training to a test distribution, provided the two distributions sufficiently overlap on their restrictions to the features that are either actually relevant or hypothesized to be. The two distributions could diverge arbitrarily on other features. We prove a simple theorem that formalizes the above intuitions, generalizing the classic sample complexity bound of Blumer et al. to an OOD context. We then generalize sparse classifiers to subspace juntas, where the ground truth classifier depends solely on a low-dimensional linear subspace of the features.