Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis
2026-02-20 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors address the problem of missing parts in 3D brain and heart MRI scans, which makes medical analysis harder. Instead of relying on manual hints about what's missing, their method, called CoPeDiT, helps the computer figure out missing information by itself. They created a special model that learns to detect and fill in gaps more accurately by understanding the MRI data better. Tests on several MRI datasets show their approach works better than existing methods, making MRI reconstruction more reliable and flexible.
MRI synthesismissing datalatent diffusion modeltokenizertransformercompleteness perception3D MRIsemantic consistencymedical image reconstructionCoPeDiT
Authors
Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Daniel C. Alexander, Le Zhang
Abstract
Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .