Graph Set Transformer

2026-06-03Machine Learning

Machine Learning
AI summary

The authors introduce the Graph Set Transformer (GST), a new neural network designed to learn from groups of graphs by considering both individual graph details and the bigger picture of the whole set at the same time. Unlike earlier methods that first process each graph separately, GST mixes local graph features and global context together at every step using a gating method. They tested GST on both made-up and real-world tasks, like identifying reaction spots in molecules and predicting chemical reaction outcomes, and found it performs better than previous methods with similar sizes. Their tests also showed that the key benefit comes from integrating local and global information together throughout the network.

Graph Neural Network (GNN)Set TransformerDeepSetsFeature PropagationCross-Graph ContextualisationGating MechanismPer-Element PredictionReaction-Centre IdentificationReaction Yield PredictionGraph Set Learning
Authors
Jose E. Escrig Molina, Baoquan Chen, Daniel Probst
Abstract
We introduce the Graph Set Transformer (GST), a neural network architecture for learning on sets of graphs, designed for tasks in which per-element predictions depend on set-wide context as well as local structure. Existing architectures, including DeepSets and SetTransformer, require pre-encoded graph embeddings from a separate GNN, creating a bottleneck between feature extraction and set-level contextualisation. In contrast, GST interleaves node-level feature propagation and cross-graph contextual modelling at every layer, fusing the two levels of information through a gating mechanism. We evaluate GST on a controlled synthetic suite designed to isolate set-conditional structural reasoning and on three real-data benchmarks spanning per-atom reaction-centre identification, reaction yield prediction, and image classification. Under matched parameter budgets, GST performs better than the baselines across these settings. An architectural ablation strongly suggests that the interleaving of local and set context contributes substantially to this advantage.