SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a new method called SemDINO to better detect and label how land changes over time using satellite images. Their approach uses two types of image features combined at multiple scales and a special transformer module to align images taken at different times. They also introduced techniques to reduce false changes caused by things like lighting or seasons. This method improves accuracy and works well even when the images have noise or complicated changes. Tests on public datasets show SemDINO outperforms existing methods.

Semantic Change DetectionLand-cover ChangeDual-branch EncoderTransformerMulti-scale RepresentationPseudo-changesRemote SensingFeature AlignmentCNNDINOv3
Authors
Xinyu Tong, Meihua Zhou, Jinxiao Sun, Yingjie Tang, Lei Wang
Abstract
Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named SemDINO, which integrates a dual-branch encoder, multi-scale temporal interaction, semantic purification, change enhancement, and decoupled multi-task prediction into a unified framework. Specifically, we construct a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. Then, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module is proposed to achieve global cross-temporal feature alignment and information interaction. To further enhance genuine changes and suppress pseudo-variations, we introduce semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules collaboratively. Finally, a multi-branch CD prediction head is designed to jointly output binary change mask, bi-temporal semantic maps, and edge constraint. Extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods, especially in complex scenarios with interference factors.