Use What You Know: Causal Foundation Models with Partial Graphs

2026-02-16Machine Learning

Machine Learning
AI summary

The authors focus on improving Causal Foundation Models (CFMs), which aim to learn cause-and-effect relationships in one step rather than using many specialized methods. They point out that existing CFMs can't include domain knowledge like known causal connections, which can make their predictions worse. To fix this, the authors develop ways to add this causal information partially or fully into CFMs, finding that changing how the model pays attention to data works best. Their tests show that with this improvement, CFMs can perform as well as specialized models designed for specific setups. This work helps bring CFMs closer to being general tools that can learn about causes while using expert knowledge when available.

Causal InferenceCausal DiscoveryCausal GraphDomain KnowledgeAttention MechanismFoundation ModelsAmortized InferenceCausal QueriesMachine LearningPartial Causal Information
Authors
Arik Reuter, Anish Dhir, Cristiana Diaconu, Jake Robertson, Ole Ossen, Frank Hutter, Adrian Weller, Mark van der Wilk, Bernhard Schölkopf
Abstract
Estimating causal quantities traditionally relies on bespoke estimators tailored to specific assumptions. Recently proposed Causal Foundation Models (CFMs) promise a more unified approach by amortising causal discovery and inference in a single step. However, in their current state, they do not allow for the incorporation of any domain knowledge, which can lead to suboptimal predictions. We bridge this gap by introducing methods to condition CFMs on causal information, such as the causal graph or more readily available ancestral information. When access to complete causal graph information is too strict a requirement, our approach also effectively leverages partial causal information. We systematically evaluate conditioning strategies and find that injecting learnable biases into the attention mechanism is the most effective method to utilise full and partial causal information. Our experiments show that this conditioning allows a general-purpose CFM to match the performance of specialised models trained on specific causal structures. Overall, our approach addresses a central hurdle on the path towards all-in-one causal foundation models: the capability to answer causal queries in a data-driven manner while effectively leveraging any amount of domain expertise.