Investigation into In-Context Learning Capabilities of Transformers

2026-04-28Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors studied how transformers can learn new tasks on the spot using examples given during testing, focusing on a specific type of problem involving two groups of data with Gaussian distributions. They explored how well these models perform depending on factors like input size, number of examples shown, and how many tasks the model saw during training. By simplifying the problem, they identified when the model can understand the patterns just from context, even when some examples had noisy labels but the model still performed well on clean data. Their experiments reveal how geometry, data quality, and prior training influence the success of this kind of learning. This helps clarify when in-context learning works well and when it struggles.

TransformersIn-Context LearningGaussian Mixture ModelBinary ClassificationLinear ClassifierBenign OverfittingSignal-to-Noise RatioDimensionalityTask DiversityContextual Information
Authors
Rushil Chandrupatla, Leo Bangayan, Sebastian Leng, Arya Mazumdar
Abstract
Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the geometric conditions under which models successfully infer task structure from context alone. We additionally investigate the emergence of benign overfitting, where models memorize noisy in-context labels while still achieving strong generalization performance on clean test data. Through extensive sweeps across dimensionality, sequence length, task diversity, and signal-to-noise regimes, we identify the parameter regions in which this phenomenon arises and characterize how it depends on data geometry and training exposure. Our results provide a comprehensive empirical map of scaling behavior in in-context classification, highlighting the critical role of dimensionality, signal strength, and contextual information in determining when in-context learning succeeds and when it fails.