Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking

Fast point cloud to mesh reconstruction and tracking for multiple deformable objects

The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30 Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud. The reconstruction of meshes from point clouds has long been studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however, both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture.

pipe1 — Training stage: The auto-encoder, comprised of an encoder and decoder, takes the deformed point cloud as input and learns an encoding through chamfer loss by comparing the decoded/reconstructed deformed point cloud with the groundtruth deformed point cloud. Then, the conditional Real-NVP model takes as input the auto-encoder’s encoding and the template mesh and learns the coordinates of the deformed mesh using chamfer loss supervised by the ground truth deformed mesh.

pipe2 — Inference stage: The encoder encodes the deformed point cloud, and then the conditional Real-NVP model takes the template mesh and the encoding as input and predicts the new coordinate for every vertex in the template mesh. Therefore, in both stages of training and inference, the deformed mesh consists of the template mesh vertices moved around by the Real-NVP, and faces consistent with those of the template mesh.

Our trained model can perform mesh reconstruction and tracking at a rate of 58 Hz on a template mesh of 3000 vertices and a deformed point cloud of 5000 points and is generalizable to the deformations of six different object categories which are assumed to be made of soft material in our experiments (scissors, hammer, foam brick, cleanser bottle, orange, and dice). The object meshes are taken from the YCB benchmark dataset. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity of our method can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our trained model to generalize beyond six object categories and additionally to real-world deforming point clouds.