OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer
2026-04-27 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed OmniShotCut, a new method to find where scenes change in videos by looking at relationships between shots rather than just individual frames. They use a Transformer model that processes video data more deeply to better detect boundaries. To get accurate training data, they created a fully synthetic way to generate different types of video transitions with exact edges. They also built a new benchmark called OmniShotCutBench to better test and evaluate shot boundary detection methods across many types of videos.
Shot Boundary DetectionVideo TransformerSynthetic Data GenerationStructured Relational PredictionShot TransitionsBenchmark DatasetVideo SegmentationDense Video Models
Authors
Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng
Abstract
Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.