CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

2026-04-09Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceRobotics
AI summary

The authors created CrashSight, a large dataset to help computers understand car crashes from roadside cameras, not just from a single car's view. It includes videos of crashes with many questions that test if models can recognize what happened, why it happened, and what came next. They tested 8 advanced vision-language models and found that while these models are good at describing scenes, they struggle with understanding the timing and causes of crashes. Their work provides a new way to evaluate and improve how machines perceive traffic accidents to support safer autonomous driving.

vision-language modelsautonomous drivingroadside camerascrash analysistemporal reasoningcausal reasoningscene understandingbenchmark datasetinfrastructure-assisted perceptiontraffic safety
Authors
Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen, Sikai Chen, Bin Ran
Abstract
Cooperative autonomous driving requires traffic scene understanding from both vehicle and infrastructure perspectives. While vision-language models (VLMs) show strong general reasoning capabilities, their performance in safety-critical traffic scenarios remains insufficiently evaluated due to the ego-vehicle focus of existing benchmarks. To bridge this gap, we present \textbf{CrashSight}, a large-scale vision-language benchmark for roadway crash understanding using real-world roadside camera data. The dataset comprises 250 crash videos, annotated with 13K multiple-choice question-answer pairs organized under a two-tier taxonomy. Tier 1 evaluates the visual grounding of scene context and involved parties, while Tier 2 probes higher-level reasoning, including crash mechanics, causal attribution, temporal progression, and post-crash outcomes. We benchmark 8 state-of-the-art VLMs and show that, despite strong scene description capabilities, current models struggle with temporal and causal reasoning in safety-critical scenarios. We provide a detailed analysis of failure scenarios and discuss directions for improving VLM crash understanding. The benchmark provides a standardized evaluation framework for infrastructure-assisted perception in cooperative autonomous driving. The CrashSight benchmark, including the full dataset and code, is accessible at https://mcgrche.github.io/crashsight.