When Does Structure Help? The Information Bonus of AlphaFold2 Representations over Protein Language Models

2026-06-02Computational Engineering, Finance, and Science

Computational Engineering, Finance, and Science
AI summary

The authors studied two types of protein models to see which works better for different tasks. They introduced a new way to measure when using complex 3D structural information is worth it compared to simpler sequence data. They found that sequence models do better for predicting binding and flexibility, but structural models are better for finding allosteric sites, which depend on 3D shape. The authors also discovered a data split error that can lead to overly optimistic results. Their work helps decide when to choose structural models or sequence models in protein research.

AlphaFold2ESM-2 embeddingsprotein binding affinityallosteryconformational flexibilityprotein structure predictionmachine learning for proteinscross-validationrepresentation learningroot-mean-square fluctuation (RMSF)
Authors
Kargi Chauhan
Abstract
AI scientist systems increasingly choose biological foundation models before they choose experiments. In protein pipelines, this creates a concrete engineering and scientific question: when is the cost of structural inference worth paying over a cheaper sequence-only model? We introduce the information bonus (IB), a task-level metric that measures the linearly accessible advantage of frozen single-sequence AlphaFold2 Evoformer representations over frozen ESM-2 embeddings under protein-level cross-validation. Across binding affinity regression (PDBbind, n=5,680), conformational flexibility (ATLAS molecular dynamics, 268 proteins), and allosteric-site classification (AlloSigDB, n=9,925 residues), IB is sharply mechanism-dependent. ESM-2 dominates binding affinity (IB=-0.141; Pearson r=0.449 vs. 0.307) and binary flexibility (IB=-0.060; AUROC 0.824 vs. 0.764; p=0.0017). AF2 single representations give the only above-chance allostery predictions (IB=+0.064; AUROC 0.548 vs. 0.485), revealing long-range geometric signal not recovered from sequence alone. We also identify a residue-level leakage artifact: naive residue splits inflate RMSF performance by 27-39% depending on the representation, enough to reverse representation rankings. These results turn representation selection into a measurable decision for AI-for-science systems.