Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
2026-02-17 • Robotics
RoboticsComputer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors developed Dex4D, a system that teaches robots how to move and handle many different objects in various ways without needing task-specific training. They trained a single policy in simulation to move objects from any starting position to any desired position using 3D point tracking. This policy can be used in the real world right away, guided by video-based object tracking, without extra tuning. Their experiments show it works well across many tasks and objects, even ones the robot hasn't seen before.
dexterous manipulationsimulation training3D point trackingrobot policyzero-shot transferclosed-loop controldomain-agnostic policyrobot-object interactionpose manipulation
Authors
Yuxuan Kuang, Sungjae Park, Katerina Fragkiadaki, Shubham Tulsiani
Abstract
Learning generalist policies capable of accomplishing a plethora of everyday tasks remains an open challenge in dexterous manipulation. In particular, collecting large-scale manipulation data via real-world teleoperation is expensive and difficult to scale. While learning in simulation provides a feasible alternative, designing multiple task-specific environments and rewards for training is similarly challenging. We propose Dex4D, a framework that instead leverages simulation for learning task-agnostic dexterous skills that can be flexibly recomposed to perform diverse real-world manipulation tasks. Specifically, Dex4D learns a domain-agnostic 3D point track conditioned policy capable of manipulating any object to any desired pose. We train this 'Anypose-to-Anypose' policy in simulation across thousands of objects with diverse pose configurations, covering a broad space of robot-object interactions that can be composed at test time. At deployment, this policy can be zero-shot transferred to real-world tasks without finetuning, simply by prompting it with desired object-centric point tracks extracted from generated videos. During execution, Dex4D uses online point tracking for closed-loop perception and control. Extensive experiments in simulation and on real robots show that our method enables zero-shot deployment for diverse dexterous manipulation tasks and yields consistent improvements over prior baselines. Furthermore, we demonstrate strong generalization to novel objects, scene layouts, backgrounds, and trajectories, highlighting the robustness and scalability of the proposed framework.