Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

2026-03-30Machine Learning

Machine Learning
AI summary

The authors study how using extra related data can help machine learning models do better on a main task. They focus on simple linear models and find exact formulas that show when the extra data actually helps, and how to best combine tasks. For a type of linear neural network, they provide new conditions that explain when auxiliary data is beneficial. They also develop a mathematical tool to better understand the structure of these models. Their findings are confirmed using simulated data.

transfer learninglinear regressiongeneralization errorbias-variance decompositionlinear neural networkstask weightinglow-rank perturbationrandom matricesunder-parameterizationshared representations
Authors
Meitong Liu, Christopher Jung, Rui Li, Xue Feng, Han Zhao
Abstract
In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.