RSPT: Reconstruct Surroundings and Predict Trajectory for Generalizable Active Object Tracking

Authors

  • Fangwei Zhong Peking University Beijing Institute for General Artificial Intelligence (BIGAI)
  • Xiao Bi Peking University
  • Yudi Zhang Shandong University
  • Wei Zhang Shandong University
  • Yizhou Wang Peking University Zhengzhou University

DOI:

https://doi.org/10.1609/aaai.v37i3.25482

Keywords:

CV: Vision for Robotics & Autonomous Driving, CV: Motion & Tracking, CV: Multi-modal Vision

Abstract

Active Object Tracking (AOT) aims to maintain a specific relation between the tracker and object(s) by autonomously controlling the motion system of a tracker given observations. It is widely used in various applications such as mobile robots and autonomous driving. However, Building a generalizable active tracker that works robustly across various scenarios remains a challenge, particularly in unstructured environments with cluttered obstacles and diverse layouts. To realize this, we argue that the key is to construct a state representation that can model the geometry structure of the surroundings and the dynamics of the target. To this end, we propose a framework called RSPT to form a structure-aware motion representation by Reconstructing Surroundings and Predicting the target Trajectory. Moreover, we further enhance the generalization of the policy network by training in the asymmetric dueling mechanism. Empirical results show that RSPT outperforms existing methods in unseen environments, especially those with cluttered obstacles and diverse layouts. We also demonstrate good sim-to-real transfer when deploying RSPT in real-world scenarios.

Downloads

Published

2023-06-26

How to Cite

Zhong, F., Bi, X., Zhang, Y., Zhang, W., & Wang, Y. (2023). RSPT: Reconstruct Surroundings and Predict Trajectory for Generalizable Active Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3705-3714. https://doi.org/10.1609/aaai.v37i3.25482

Issue

Section

AAAI Technical Track on Computer Vision III