SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding
2026-06-03 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied a type of language model called diffusion large language models (DLLMs) that generate text by gradually improving noisy sequences. These models are slow because they need many steps to produce good results. To speed this up, the authors introduced SAID, a method that focuses first on important 'scaffold' words to form the main meaning, then quickly completes the easier parts of the text. They tested SAID on different tasks and found it can make the models up to 9 times faster without losing accuracy.
Diffusion Large Language ModelsNon-autoregressive generationDenoisingBidirectional contextIterative decodingScaffold tokensBlock-wise diffusionConfidence-Hierarchical Layered GenerationLLaDA model
Authors
Na Li, Chengda Wang, Mingju Gao, Hao Tang
Abstract
Diffusion large language models (DLLMs) enable non-autoregressive generation by iteratively denoising corrupted token sequences with bidirectional context. Despite their ability to update multiple positions in parallel, inference remains costly due to the many denoising steps required for high-quality generation. We propose SAID, a Scaffold-Aware Iterative Decoding framework that accelerates DLLMs by reallocating computation across tokens. SAID first spends denoising computation on scaffold tokens to establish the coarse semantic structure, and then completes predictable detail tokens with fewer steps. We further adapt SAID to block-wise diffusion decoding and introduce Confidence-Hierarchical Layered Generation (CHLG), which assigns additional steps only to low-confidence tokens. Experiments on LLaDA-8B and LLaDA 1.5 across math, coding, and knowledge benchmarks show that SAID significantly accelerates DLLM inference with a maximum speedup of 9.1x while maintaining competitive performance. Our code is publicly available: https://github.com/TH-AI-Lab-PKU/SAID.