MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
2026-02-19 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors developed MolHIT, a new method for creating molecular structures using a special type of AI called a diffusion model. This approach improves on past methods by using chemical knowledge more effectively and treating different atoms according to their chemical roles. As a result, MolHIT produces valid molecules more reliably and performs better than earlier models on standard tests. The authors also showed it works well for tasks like designing molecules with specific properties and extending chemical frameworks.
Molecular generationDiffusion modelsGraph diffusionChemical validityDiscrete diffusionAtom encodingMOSES datasetMulti-property generationScaffold extensionDrug discovery
Authors
Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong
Abstract
Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.