EpiFormer: Learning Antigen-Antibody Interactions for Epitope Prediction via Geometric Deep Learning

2026-06-02Machine Learning

Machine Learning
AI summary

The authors developed EpiFormer, a new computer model to better predict where antibodies bind on antigens, called epitopes. Unlike previous methods, their model mixes information between antibodies and antigens early in the process, helping it learn important interaction features. They also use special techniques to handle the problem of having very few examples of antibody-antigen pairs. EpiFormer performs much better than older methods and even learns known biology patterns on its own, such as favoring shape information over evolutionary data.

antibodiesantigensepitopecomputational epitope predictiongraph neural networks (GNN)cross-attentionclass imbalanceencoder-decoder frameworkF1 scorebinding interface
Authors
Mansoor Ahmed, Huirong Chai, Haoxin Wang, Hemanth Venkateswara, Murray Patterson
Abstract
Antibodies neutralize foreign antigens by binding to specific surface regions called epitopes. Computational epitope prediction is critical for understanding immune recognition and guiding antibody engineering. However, existing methods face three fundamental challenges: antibody-aware models encode each chain independently and combine them only at a late stage, failing to capture co-dependent structural features that define binding interfaces, whereas severe class imbalance and scarcity of known antibody-antigen complexes render standard training objectives ineffective. We propose EpiFormer, a general encoder-decoder framework that addresses these challenges jointly. Our key design principle is interleaved cross-attention within GNN encoding layers, enabling bidirectional antigen-antibody information flow throughout representation learning rather than only at the output. This early-fusion principle is backbone-agnostic, providing consistent gains across GNN architectures from simple GCNs to equivariant models. We further show that sparsity-aware objectives are effective when paired with early-fusion architectures for the epitope prediction task. EpiFormer improves over the previous best method by over 40% in F1 score on standard benchmarks, demonstrating generalizability and cross-dataset transferability. Notably, EpiFormer discovers known biological principles as emergent behaviors of end-to-end training, where the learned cross-attention gates favor antigen-to-antibody information flow, consistent with the asymmetric roles of the two chains at the binding interface, and the model's preference for geometric over evolutionary features aligns with the established finding that epitope residues are not evolutionarily conserved. The source code is available at: https://github.com/mansoor181/epiformer.git