Camera-to-Robot Pose Estimation from a Single Image


We present an approach for estimating the pose of a camera with respect to a robot from a single image. Our method uses a deep neural network to process an RGB image from the camera to detect 2D keypoints on the robot. The network is trained entirely on simulated data using domain randomization. Perspective-n-point (PnP) is then used to recover the camera extrinsics, assuming that the joint configuration of the robot manipulator is known. Unlike classic hand-eye calibration systems, our method does not require an off-line calibration step but rather is capable of computing the camera extrinsics from a single frame, thus opening the possibility of on-line calibration. We show experimental results for three different camera sensors, demonstrating that our approach is able to achieve accuracy with a single frame that is better than that of classic off-line hand-eye calibration using multiple frames. With additional frames, accuracy improves even further. Code, datasets, and pretrained models for three widely-used robot manipulators will be made available.

International Conference on Robotics and Automation (ICRA) 2020, Paris, France (to appear)

Work was completed while T. E. Lee was an intern at NVIDIA Seattle Robotics Lab in Summer 2019.

Author Affiliations:

  • Carnegie Mellon University: T. E. Lee, O. Kroemer
  • NVIDIA: J. Tremblay, T. To, J. Cheng, T. Mosier, D. Fox, S. Birchfield