Order Matters in Retrosynthesis: Structure-aware Generation via Reaction-Center-Guided Discrete Flow Matching

2026-02-13Machine Learning

Machine Learning
AI summary

The authors found that the order of atoms in chemical reaction representations is very important for training models that predict how to make molecules. They designed a new method, RetroDiT, which puts the key reaction atoms at the front of the sequence to help the model learn better. This approach speeds up molecule generation and improves accuracy compared to older methods, even when using much less training data. Their tests show that adding chemical structure knowledge is more effective than just making models bigger.

RetrosynthesisTemplate-free methodsReaction centerGraph transformerRotary position embeddingsUSPTO datasetDiscrete flow matchingMolecule generationPositional inductive bias
Authors
Chenguang Wang, Zihan Zhou, Lei Bai, Tianshu Yu
Abstract
Template-free retrosynthesis methods treat the task as black-box sequence generation, limiting learning efficiency, while semi-template approaches rely on rigid reaction libraries that constrain generalization. We address this gap with a key insight: atom ordering in neural representations matters. Building on this insight, we propose a structure-aware template-free framework that encodes the two-stage nature of chemical reactions as a positional inductive bias. By placing reaction center atoms at the sequence head, our method transforms implicit chemical knowledge into explicit positional patterns that the model can readily capture. The proposed RetroDiT backbone, a graph transformer with rotary position embeddings, exploits this ordering to prioritize chemically critical regions. Combined with discrete flow matching, our approach decouples training from sampling and enables generation in 20--50 steps versus 500 for prior diffusion methods. Our method achieves state-of-the-art performance on both USPTO-50k (61.2% top-1) and the large-scale USPTO-Full (51.3% top-1) with predicted reaction centers. With oracle centers, performance reaches 71.1% and 63.4% respectively, surpassing foundation models trained on 10 billion reactions while using orders of magnitude less data. Ablation studies further reveal that structural priors outperform brute-force scaling: a 280K-parameter model with proper ordering matches a 65M-parameter model without it.