Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3585967.3585990acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicwcsnConference Proceedingsconference-collections
research-article

Multi-Object Tracking based on RGB-D Sensors

Published: 19 April 2023 Publication History

Abstract

The accuracy of the multi-object tracking (MOT) based on the 2D camera without depth info is usually poor. In this paper, we propose a MOT method based on sensors composed of the camera and the ultra-wide band (UWB) radar, which are similar to the depth camera (RGB-D camera). First, we establish a backbone network to extract feature maps from video frames captured by a camera. Then, we combine Faster R-CNN with a re-ID branch to detect objects including the category, coordinate and ID. To track objects, we construct a similarity matrix to calculate the data association between the objects and their historical trajectories. The matrix's elements are calculated by the intersection over union (IoU) between the objects and their related two types of trajectories, which are based on the image data and the UWB localization data separately. Finally, the trajectories are updated by the two types of trajectories, and the recognition network is updated by the localization loss. The experimental results show that our method achieves multi-object recognition and tracking, and outperforms previous methods by a large margin on several public datasets.

References

[1]
P. L. Li, T. Qin and S. J. Shen, “Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving,” Proc. of European Conf. on Computer Vision (ECCV), Munich, Germany, pp. 646-661, 2018.
[2]
Y. F. Li, W. Ren, T. Q. Zhu, Y. Ren, Y. Qin, and W. Jie, “RIMS: A real-time and intelligent monitoring system for live-broadcasting platforms,” Futur. Gener. Comp. Syst., Vol. 87, pp. 259-266, 2018.
[3]
S. Verma, Y. H. Eng, H. X. Kong, H. Andersen, M. Meghjani, “Vehicle detection, tracking and behavior analysis in urban driving environments using road context,” Proc. of 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), Brisbane, Australia, pp. 20-25, 2018.
[4]
J. Zhang, S. L. Xu, and F. Deng, “Design and Implementation of Intelligent Event-Driven Human-Computer Interface on Vehicles,” J. Adv. Comput. Intell. Intell. Inform., Vol.19, No.2, pp. 247-254, 2015.
[5]
Z. Jiang and D. Q. Huynh, “Multiple pedestrian tracking from videos in an interacting multiple model framework,” IEEE Trans. on Image Process, Vol. 27, pp. 1361-1375, 2018.
[6]
C. Zimmermann and T. Brox, “Learning to estimate 3d hand pose from single RGB images,” Proc. of 2017 IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, pp. 4903-4911, 2017.
[7]
X. Zhao, F. Pu, Z. Wang, H. Chen and Z. Xu, “Detection, tracking, and geolocation of moving vehicle from UAV using camera,” IEEE Access, Vol. 7, pp. 101160-101170, 2019.
[8]
S. Sridhar, A. Oulasvirta and C. Theobalt, “Interactive markerless articulated hand motion tracking using RGB and depth data,” Proc. of 2013 IEEE Int. Conf. on Computer Vision (ICCV), Sydney, Australia, pp. 2456-2463, 2013.
[9]
P. Li, T. Qin and S. Shen, “Stereo vision-based semantic 3d object and ego-motion tracking for autonomous driving,” Proc. of European Conf. on Computer Vision (ECCV), pp. 646-661, 2018.
[10]
T. Dieterle, F. Particke, L. Patino-Studencki and J. Thielecke, “Sensor data fusion of LIDAR with stereo RGB-D camera for object tracking,” 2017 IEEE Sensors, Glasgow, UK, pp. 1-3, 2017.
[11]
H. N. Hu, Q. Z. Cai, D. Wang, J. Lin, M. Sun, “Joint 3D vehicle detection and tracking,” Proc. of 2019 IEEE Int. Conf. on Computer Vision (ICCV), Seoul, Korea, pp. 5390-5399, 2019.
[12]
R. Mur-Artal and J. D. Tardos, “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. on Robotics, Vol. 33, No. 5, pp. 1255-1262, 2017.
[13]
C. Jing, J. Potgieter, F. Noble and R. Wang, “A comparison and analysis of RGB-D cameras’ depth performance for robotics application,” Proc. of 2017 24th Int. Conf. on Mechatronics and Machine Vision in Practice (M2VIP), Auckland, New Zealand, pp. 1-6, 2017.
[14]
K. Chen, Y. K. Lai and S. M. Hu, “3D Indoor scene modeling from RGB-D data: A survey,” Comput. Vis. Media, Vol. 1, pp. 267-278, 2015.
[15]
K. Han, “Image object tracking based on temporal context and MOSSE,” Cluster Comput., Vol. 20, pp. 1259-1269, 2017.
[16]
T. Zhang, C. Xu and M. H. Yang, “Multi-task correlation particle filter for robust object tracking,” Proc. of 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 4819-4827, 2017.
[17]
J. F. Henriques, R. Caseiro, P. Martins and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 37, No. 3, pp. 583-596, 2015.
[18]
S. E. Li, G. Li, J. Yu, C. Liu, B. Cheng, “Kalman filter-based tracking of moving objects using linear ultrasonic sensor array for road vehicles,” Mech. Syst. Signal Proc., Vol. 98, pp. 173-189, 2018.
[19]
L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik and P. H. Torr, “Staple: Complementary learners for real-time tracking,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 1401-1409, 2016.
[20]
N. S. Mahmoudi, M. Ahadi and M. Rahmati, “Multi-object tracking using CNN-based features: CNNMTT,” Multimed. Tools Appl., Vol. 78, pp. 7077-7096, 2019.
[21]
L. Chen, H. Ai, Z. Zhuang and C. Shang, “Real-time multiple people tracking with deeply learned candidate selection and person re-identification,” Proc. of 2018 IEEE Int. Conf. on Multimedia and Expo (ICME), San Diego, CA, USA, pp. 1-6, 2018.
[22]
Y. Xu, Y. Ban, X. Alameda-Pineda and R. Horaud, “DeepMOT: A differentiable framework for training multiple object trackers,” ArXiv Preprint ArXiv:1906.06618, 2019.
[23]
J. Chen, H. Sheng, Y. Zhang and Z. Xiong, “Enhancing detection model for multiple hypothesis tracking,” Proc. of 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, pp. 2143-2152, 2017.
[24]
R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. of 2014 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, pp. 580-587, 2014.
[25]
R. Girshick, “Fast R-CNN,” Proc. of 2015 IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, pp. 1440-1448, 2015.
[26]
S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137-1149, 2017.
[27]
K. He, G. Gkioxari, P. Dollár and R. Girshick, “Mask R-CNN,” Proc. of 2017 IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, pp. 22-29, 2017.
[28]
P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, “MOTS: Multi-object tracking and segmentation,” Proc. of 2019 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7942-7951, 2019.
[29]
P. Bergmann, T. Meinhardt and L. Leal-Taixe, “Tracking without bells and whistles,” Proc. of 2019 IEEE Int. Conf. on Computer Vision (ICCV), Seoul, Korea, pp. 941-951, 2019.
[30]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ArXiv Preprint ArXiv:1409.1556, 2014.
[31]
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, “Microsoft coco: Common objects in context,” Proc. of European Conf. on Computer Vision (ECCV), Zurich, Switzerland, pp. 740-755, 2014.
[32]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ArXiv Preprint ArXiv:1412.6980, 2014.
[33]
A. Milan, L. Leal-Taixé, I. Reid, S. Roth and K. Schindler, “MOT16: A benchmark for multi-object tracking,” ArXiv Preprint ArXiv:1603.00831, 2016.
[34]
P. Felzenszwalb, R. Girshick, D. McAllester and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, pp. 1627-1645, 2010.
[35]
F. Yang, W. Choi and Y. Lin, “Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers,” Proc. of 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 2129-2137, 2016.

Cited By

View all
  • (2024)Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative studyInformation Fusion10.1016/j.inffus.2024.102247105(102247)Online publication date: May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
icWCSN '23: Proceedings of the 2023 10th International Conference on Wireless Communication and Sensor Networks
January 2023
162 pages
ISBN:9781450398466
DOI:10.1145/3585967
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Faster R-CNN network
  2. RGB-D
  3. re-ID network
  4. ulti-object tracking

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Science and Technology Projects of State Grid Corporation of China

Conference

icWCSN 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative studyInformation Fusion10.1016/j.inffus.2024.102247105(102247)Online publication date: May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media