Multi-Task Bayesian In-Context Learning

2026-06-18Machine Learning

Machine Learning
AI summary

The authors propose a new way for computers to learn how to make predictions by understanding different types of prior knowledge directly from data examples. They use a transformer model that treats prior information as part of the input, allowing it to adjust predictions when faced with new or different prior knowledge. Their method is faster than traditional Bayesian approaches and performs well even when tested on unfamiliar scenarios or complex data. They also tested their approach on a real-world temperature prediction task, showing its practical usefulness.

Bayesian inferencepredictive distributiontransformerin-context learningprior distributionhierarchical Bayesian modelamortized inferencedistribution shiftspatiotemporal predictionmachine learning
Authors
Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho
Abstract
Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.