TRACE: Object Motion Editing in Videos with First-Frame Trajectory Guidance
2026-03-26 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors study how to change the path of a moving object in a video without messing up the rest of the scene. They created Trace, a tool where users draw the desired path in one frame, and the program adjusts the entire video's object movement accordingly. Their method works in two steps: first, it translates the path design to match each frame even if the camera moves; second, it recreates the object following these new paths while keeping everything else in the video unchanged. Tests show their approach makes smoother and more accurate edits compared to other recent methods.
video editingobject motiontrajectory controlcamera motionvideo synthesiscross-view transformationtemporal consistencymotion-conditioned re-synthesisvideo-to-video editinganchor frame
Authors
Quynh Phung, Long Mai, Cusuh Ham, Feng Liu, Jia-Bin Huang, Aniruddha Mahapatra
Abstract
We study object motion path editing in videos, where the goal is to alter a target object's trajectory while preserving the original scene content. Unlike prior video editing methods that primarily manipulate appearance or rely on point-track-based trajectory control, which is often challenging for users to provide during inference, especially in videos with camera motion, we offer a practical, easy-to-use approach to controllable object-centric motion editing. We present Trace, a framework that enables users to design the desired trajectory in a single anchor frame and then synthesizes a temporally consistent edited video. Our approach addresses this task with a two-stage pipeline: a cross-view motion transformation module that maps first-frame path design to frame-aligned box trajectories under camera motion, and a motion-conditioned video re-synthesis module that follows these trajectories to regenerate the object while preserving the remaining content of the input video. Experiments on diverse real-world videos show that our method produces more coherent, realistic, and controllable motion edits than recent image-to-video and video-to-video methods.