Autonomous Vision-based Rapid Aerial Grasping
Advancing aerial robotics, we develop a vision-based system for autonomous drone grasping, utilizing depth cameras and geometry-based grasp planning to achieve precise object localization without markers or prior knowledge of object size, paving the way for real-world applications.
In a future with autonomous robots, visual and spatial perception is of utmost importance for robotic systems. Particularly for aerial robotics, there are many applications where utilizing visual perception is necessary for any real-world scenarios. Robotic aerial grasping using drones promises fast pick-and-place solutions with a large increase in mobility over other robotic solutions. Utilizing Mask R-CNN scene segmentation (detectron2), we propose a vision-based system for autonomous rapid aerial grasping which does not rely on markers for object localization and does not require the size of the object to be previously known. With spatial information from a depth camera, we generate a point cloud of the detected objects and perform geometry-based grasp planning to determine grasping points on the objects. In real-world experiments, we show that our system can localize objects with a mean error of 3 cm compared to a motion capture ground truth for distances from the object ranging from 0.5 m to 2.5 m. Similar grasping efficacy is maintained compared to a system using motion capture for object localization in experiments. With our results, we show the first use of geometry-based grasping techniques with a flying platform and aim to increase the autonomy of existing aerial manipulation platforms, bringing them further towards real-world applications in warehouses and similar environments.