EventDrive: Event Cameras for Vision-Language Driving Intelligence

2026-06-16 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors explore how special cameras called event cameras, which capture changes in light very quickly, can help self-driving cars understand their surroundings better than normal cameras. They created EventDrive, a set of tests and models that combine these event camera data with regular images and language to improve tasks like recognizing motion, predicting movements, and planning routes. Their new model, EventDrive-VLM, is designed to mix the fast event data with slower video frames to make smarter decisions. They found that using event cameras improves timing accuracy and motion understanding, making them valuable for autonomous driving.

event camerasautonomous drivingvision-language modelstrajectory forecastingmotion recognitionevent streamsRGB framestemporal precisionmulti-horizon event pyramidmixture-of-experts

Authors

Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi

Abstract

Event cameras sense the world through asynchronous brightness changes with microsecond latency and high dynamic range, offering motion fidelity far beyond frame-based sensors and capturing temporal structure that conventional exposures often miss. These properties make events a powerful complement to RGB in autonomous driving, especially under blur, glare, and rapid motion, where frame-based perception can become unreliable. However, existing event-aware vision-language models remain limited to generic perception and do not reveal how event sensing contributes to reasoning and decision-making across the full driving loop. We present EventDrive, a large-scale benchmark and model suite that unifies event streams, RGB frames, and language supervision across four core dimensions: Perception, Understanding, Prediction, and Planning, covering captions, structured QA, grounding, motion-state recognition, trajectory forecasting, and planning tasks. Building on this foundation, EventDrive-VLM introduces a multi-horizon event pyramid and a temporal-horizon mixture-of-experts module to adaptively encode and fuse asynchronous and frame-based information for downstream reasoning. Comprehensive evaluation across diverse tasks shows that event streams provide substantial gains in temporal precision, motion awareness, and robustness, bringing event sensing into the center of driving intelligence.

View PDFOpen arXiv