What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation

2026-05-29 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors study a type of language model called masked diffusion language models (MDLMs) for turning graphs into text. They find that MDLMs reveal a unique pattern: they generate entities first, then relational words, and finally structural tokens, unlike normal language models that write text in order. They discover that traditional fine-tuning breaks this pattern by forcing sentence endings too early, causing errors. To fix this, they propose a method to reduce confidence in structural tokens during decoding, which improves results. They also build a new model, Graph-LLaDA, that better understands graph structures and generalizes well across datasets.

masked diffusion language modelsgraph-to-text generationautoregressive language modelsiterative decodingsupervised fine-tuningBLEU scoreGraph TransformerLLaDArelational graph structure

Authors

Qing Wang, Jacob Devasier, Chengkai Li

Abstract

We present the first systematic study of masked diffusion language models (MDLMs) for graph-to-text generation. We analyze MDLM generation trajectories -- the order in which tokens are unmasked during iterative decoding -- and find that, unlike autoregressive LLMs which generate text linearly, MDLMs naturally prioritize entities first, followed by relational and function words, with structural tokens resolved last. We further identify a previously undocumented failure mode of supervised fine-tuning: SFT disrupts this strategy by prematurely anchoring structural sentence-ending tokens early in the decoding trajectory, effectively fixing the output length which can lead to omitted or hallucinated information. To address this, we propose lambda-scaled structural decoding, a training-free inference-time modification that downweights structural token confidence and recovers +9.4 BLEU-4. Finally, we introduce Graph-LLaDA, which integrates a Graph Transformer encoder into LLaDA's decoding process to explicitly incorporate relational graph structure. Cross-dataset evaluation on LAGRANGE reveals that previous baselines overfit to dataset-specific patterns, while LLM- and MDLM-based approaches generalize significantly better.

View PDFOpen arXiv