Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control
2026-05-22 • Robotics
Robotics
AI summaryⓘ
The authors developed a robotic system that picks strawberries by combining a special camera program to spot fruit, a smart control method trained through simulations, and real robot execution. Their vision program improved how well the robot can identify strawberries even when things are messy or cluttered. The control method helped the robot move smoothly and accurately while reaching for and picking the strawberries. In tests, the system successfully picked many strawberries with high accuracy, showing that using simulated training and tailored perception can effectively guide robots in farming tasks.
closed-loop robotic systeminstance segmentationYOLO architecturedeep reinforcement learningProximal Policy Optimization (PPO)UR10e manipulatorROS (Robot Operating System)Isaac Lab simulationinverse kinematicsagricultural robotics
Authors
Al Bashir, Shao-Yang Chang, Partho Ghose, Prem Raj, Chen-Kang Huang, Azlan Zahid
Abstract
This study presents a closed-loop robotic strawberry harvesting system that combines a robust vision module, simulation-trained deep reinforcement learning (DRL) control, and ROS-based realrobot execution. For perception, we propose HRAttnEdge-YOLO26-seg, a modified YOLO26-seg architecture that incorporates a high-resolution P2 branch, segmentation-path attention, and edgesupervised prototype learning to improve instance segmentation in cluttered scenes. For control, we train a target-conditioned Proximal Policy Optimization (PPO) policy in Isaac Lab to produce smooth joint-position commands for a UR10e manipulator and deploy it on a UR10e robot for targetfruit reaching and harvesting. This simulation-based approach reduces hardware dependency, lowers development cost, and allows scalable policy training without exhaustive physical trials before real deployment. The proposed vision model demonstrated the highest overall performance among the evaluated methods. On both self-collected and public datasets, the model showed a 10 to 14% improvement in segmentation performance. In controlled in-house tests, the PPO controller produced stable and dynamically smoother motion than a inverse kinematics (IK)-based MoveIt baseline. In greenhouse trials, the proposed integrated system harvested 281 strawberries, achieving 96.6% reaching success, 91.3% grasp-and-pull success, and 84.3% overall harvesting success. These results illustrate that task-specific perception combined with simulation-trained PPO can serve as a practical and resource-efficient alternative to conventional planner-dependent reaching in manipulation, enabling reliable closed-loop robotic harvesting in complex agricultural environments.