Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

2026-05-31Computation and Language

Computation and Language
AI summary

The authors study masked diffusion language models (MDLMs), which predict all tokens repeatedly but usually lock in token choices early during sampling, losing the chance to fix mistakes. They propose D3IM, a new method that allows the model to revise tokens directly without extra complicated steps. They find a problem called preservation bias, where the model tends to keep wrong tokens once committed, and fix it with SCOPE, a lightweight training adjustment. Together, these methods improve performance on several math and coding benchmarks, especially with more steps of denoising.

masked diffusion language modelstoken samplingdenoising stepspreservation biasD3IM samplerSCOPE traininglanguage model revisionLLaDA-8BGSM8KHumanEval
Authors
Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu, Yue Dong, Greg Ver Steeg
Abstract
Masked diffusion language models (MDLMs) re-predict every position at each denoising step, but standard samplers commit tokens once revealed, leaving this revision capability unused. Existing approaches either add heuristic or learned mechanisms to revise committed tokens, or remask them back to [MASK] before re-predicting; a principled sampler that directly revises visible tokens without auxiliary modules remains underexplored. We introduce D3IM, a parameter-free sampler derived as a corrector-style reverse update that permits direct visible-to-visible revision without additional modules or auxiliary passes. D3IM also reveals a model-side obstacle we term preservation bias: the model tends to reproduce its own wrong committed tokens rather than correct them. We address this with SCOPE (Self-Conditioned On Prediction Errors), a lightweight post-training procedure that simulates D3IM's sampling process. On LLaDA-8B at 64 denoising steps, SCOPE+D3IM improves over the original LLaDA-8B with standard unmasking by +13.0 on GSM8K (68.3%), +4.8 on MATH-500 (23.6%), +15.3 on HumanEval (29.3%), and +10.4 on MBPP (30.8%), with gains that increase as more denoising steps are used on math and HumanEval.