Crystalite: A Lightweight Transformer for Efficient Crystal Modeling

2026-04-02Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors developed Crystalite, a new model to help generate crystal structures more efficiently than previous methods. They introduced a new way to represent atoms called Subatomic Tokenization and added a Geometry Enhancement Module to help the model better understand the repeating patterns in crystals. These changes let their Transformer-based model work faster and still produce high-quality crystal structures. Crystalite performed better than other models on tests for predicting and creating crystals.

Generative modelsEquivariant graph neural networksCrystal structure predictionTransformerDiffusion modelSubatomic TokenizationGeometry Enhancement ModulePeriodic boundary conditionsAttention mechanismS.U.N. discovery score
Authors
Tin Hadži Veljković, Joshua Rosenthal, Ivor Lončarić, Jan-Willem van de Meent
Abstract
Generative models for crystalline materials often rely on equivariant graph neural networks, which capture geometric structure well but are costly to train and slow to sample. We present Crystalite, a lightweight diffusion Transformer for crystal modeling built around two simple inductive biases. The first is Subatomic Tokenization, a compact chemically structured atom representation that replaces high-dimensional one-hot encodings and is better suited to continuous diffusion. The second is the Geometry Enhancement Module (GEM), which injects periodic minimum-image pair geometry directly into attention through additive geometric biases. Together, these components preserve the simplicity and efficiency of a standard Transformer while making it better matched to the structure of crystalline materials. Crystalite achieves state-of-the-art results on crystal structure prediction benchmarks, and de novo generation performance, attaining the best S.U.N. discovery score among the evaluated baselines while sampling substantially faster than geometry-heavy alternatives.