Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Visual–inertial object tracking: : Incorporating camera pose into motion models

Published: 01 November 2023 Publication History

Abstract

Visual object tracking for autonomy of aerial robots could become challenging especially in the presence of target or camera fast motions and long-term occlusions. This paper presents a visual–inertial tracking paradigm by incorporating the camera kinematics states into the visual object tracking pipelines. We gathered a dataset of image sequences with the addition of camera’s position and orientation measurements as well as the object’s position measurement. For the cases of long-term object occlusion, we provide ground-truth boxes derived from mapping the measured object position onto the image frame. A search zone proposal method is developed based on the estimation of object future position represented in the inertial frame and projected back onto the image frame using the camera states. This search zone, which is robust to fast camera/target motions, is fused into the original search zone settings of the base tracker. Also proposed is a measure indicating the confidence of a tracking structure in keeping track of a correct target and reporting the tracking failure in-time. Accordingly, the model updating mechanism of base tracker is modulated to avoid recovering of wrong objects as the target. The proposed modifications are benchmarked on nine visual object tracking algorithms including five state-of-art deep learning structures, namely DiMP, PrDiMP, KYS, ToMP, and MixFormer. Most of the trackers are remarkably improved by the modifications with up to 8% increase in precision. Modified PrDiMP tracker yields the best precision of 68.4%, more than all considered original (and modified) trackers. Source code and dataset are made available online. https://github.com/sehomi/pyTrackers.

References

[1]
Al-Kaff A., Martin D., Garcia F., de la Escalera A., Armingol J.M., Survey of computer vision algorithms and applications for unmanned aerial vehicles, Expert Systems with Applications 92 (2018) 447–463.
[2]
Aldoma A., Tombari F., Prankl J., Richtsfeld A., Di Stefano L., Vincze M., Multimodal cue integration through hypotheses verification for rgb-d object recognition and 6dof pose estimation, in: 2013 IEEE international conference on robotics and automation, IEEE, 2013, pp. 2104–2111.
[3]
Ballas N., Yao L., Pal C., Courville A., Delving deeper into convolutional networks for learning video representations, 2015, arXiv preprint arXiv:1511.06432.
[4]
Bertinetto L., Valmadre J., Henriques J.F., Vedaldi A., Torr P.H.S., Fully-convolutional siamese networks for object tracking, in: European conference on computer vision, Springer, 2016, pp. 850–865.
[5]
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6182–6191).
[6]
Bhat G., Danelljan M., Gool L.V., Timofte R., Know your surroundings: Exploiting scene information for object tracking, in: European conference on computer vision, Springer, 2020, pp. 205–221.
[7]
Bolme D.S., Beveridge J.R., Draper B.A., Lui Y.M., Visual object tracking using adaptive correlation filters, in: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, 2010, pp. 2544–2550.
[8]
Börlin N., Lindström P., Eriksson J., A globally convergent Gauss-Newton algorithm for the bundle adjustment problem with functional constraints, Wichmann-Verlag, 2003.
[9]
Cebollada S., Payá L., Flores M., Peidró A., Reinoso O., A state-of-the-art review on mobile robotics tasks using artificial intelligence and visual data, Expert Systems with Applications 167 (2021).
[10]
Choi W., Pantofaru C., Savarese S., A general framework for tracking multiple people from a moving camera, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (7) (2012) 1577–1591.
[11]
Cui Y., Jiang C., Wu G., Wang L., MixFormer: End-to-End Tracking with Iterative Mixed Attention, 2023.
[12]
Dai J.S., Euler–rodrigues formula variations, quaternion conjugation and intrinsic connections, Mechanism and Machine Theory 92 (2015) 144–152.
[13]
Dalal N., Triggs B., Histograms of oriented gradients for human detection, in: 2005 IEEE computer society conference on computer vision and pattern recognition: Vol. 1, IEEE, 2005, pp. 886–893.
[14]
Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4660–4669).
[15]
Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[16]
Danelljan M., Häger G., Khan F., Felsberg M., Accurate scale estimation for robust visual tracking, in: British machine vision conference, Nottingham, September 1-5, 2014, Bmva Press, 2014.
[17]
Danelljan M., Häger G., Khan F.S., Felsberg M., Discriminative scale space tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (8) (2016) 1561–1575.
[18]
Dobrokhodov V.N., Kaminer I.I., Jones K.D., Ghabcheloo R., Vision-based tracking and motion estimation for moving targets using small UAVs, in: 2006 American control conference, IEEE, 2006, p. 6.
[19]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., et al., An image is worth 16 × 16 words: Transformers for image recognition at scale, 2020, arXiv preprint arXiv:2010.11929.
[20]
Falcone P., Colone F., Macera A., Lombardo P., Localization and tracking of moving targets with WiFi-based passive radar, in: 2012 IEEE radar conference, IEEE, 2012, pp. 0705–0709.
[21]
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., et al. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5374–5383).
[22]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
[23]
He K., Zhang C., Xie S., Li Z., Wang Z., Target-aware tracking with long-term context attention, 2023, arXiv preprint arXiv:2302.13840.
[24]
Henriques J.F., Caseiro R., Martins P., Batista J., Exploiting the circulant structure of tracking-by-detection with kernels, in: European conference on computer vision, Springer, 2012, pp. 702–715.
[25]
Henriques J.F., Caseiro R., Martins P., Batista J., High-speed tracking with kernelized correlation filters, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (3) (2014) 583–596.
[26]
Hess-Flores M., Duchaineau M.A., Goldman M.J., Joy K.I., Iterative dense correspondence correction through bundle adjustment feedback-based error detection, Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States), 2009.
[27]
Huang L., Zhao X., Huang K., Got-10k: A large high-diversity benchmark for generic object tracking in the wild, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (5) (2019) 1562–1577.
[28]
Hwangbo M., Kim J.-S., Kanade T., Inertial-aided KLT feature tracking for a moving camera, in: 2009 IEEE/RSJ international conference on intelligent robots and systems, IEEE, 2009, pp. 1909–1916.
[29]
Jiang, B., Luo, R., Mao, J., Xiao, T., & Jiang, Y. (2018). Acquisition of localization confidence for accurate object detection. In Proceedings of the european conference on computer vision (pp. 784–799).
[30]
Jiang W., Yin Z., Combining passive visual cameras and active IMU sensors for persistent pedestrian tracking, Journal of Visual Communication and Image Representation 48 (2017) 419–431.
[31]
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin Zajc, L., et al. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the european conference on computer vision workshops.
[32]
Kurz D., Himane S.B., Inertial sensor-aligned visual feature descriptors, in: CVPR 2011, IEEE, 2011, pp. 161–166.
[33]
Li, F., Tian, C., Zuo, W., Zhang, L., & Yang, M. -H. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4904–4913).
[34]
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8971–8980).
[35]
Li, S., & Yeung, D. -Y. (2017). Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Thirty-first aaai conference on artificial intelligence.
[36]
Li Y., Zhu J., Hoi S.C., Song W., Wang Z., Liu H., Robust estimation of similarity transformation for visual object tracking, Proceedings of the AAAI conference on artificial intelligence, 2019, pp. 8666–8673.
[37]
Lin L., Fan H., Zhang Z., Xu Y., Ling H., Swintrack: A simple and strong baseline for transformer tracking, Advances in Neural Information Processing Systems 35 (2022) 16743–16754.
[38]
Liu X., Multi-view 3D human tracking in crowded scenes, Proceedings of the AAAI conference on artificial intelligence, Vol. 30, 2016.
[39]
Liu X., Yang Y., Ma C., Li J., Zhang S., Real-time visual tracking of moving targets using a low-cost unmanned aerial vehicle with a 3-axis stabilized gimbal system, Applied Sciences 10 (15) (2020) 5064.
[40]
Lourakis M.L.A., Argyros A.A., Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment?, Tenth IEEE international conference on computer vision volume 1, Vol. 2, IEEE, 2005, pp. 1526–1531.
[41]
Lukezic, A., Vojir, T., Čehovin Zajc, L., Matas, J., & Kristan, M. (2017). Discriminative correlation filter with channel and spatial reliability. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6309–6318).
[42]
Ma, C., Yang, X., Zhang, C., & Yang, M. -H. (2015). Long-term correlation tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5388–5396).
[43]
Mayer, C., Danelljan, M., Bhat, G., Paul, M., Paudel, D. P., Yu, F., et al. (2022). Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8731–8740).
[44]
Mei X., Porikli F., Joint tracking and video registration by factorial hidden Markov models, in: 2008 IEEE international conference on acoustics, speech and signal processing, IEEE, 2008, pp. 973–976.
[45]
Mirtajadini S.H., Fahimi H., Shahbazi M., Fast object tracking using micro aerial vehicles, in: 2021 9th RSI international conference on robotics and mechatronics, IEEE, 2021, pp. 530–535.
[46]
Mueller M., Smith N., Ghanem B., A benchmark and simulator for uav tracking, in: European conference on computer vision, Springer, 2016, pp. 445–461.
[47]
Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1396–1404).
[48]
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the european conference on computer vision (pp. 300–317).
[49]
Petrovskaya A., Thrun S., Model based vehicle detection and tracking for autonomous urban driving, Autonomous Robots 26 (2) (2009) 123–139.
[50]
Qiu K., Qin T., Gao W., Shen S., Tracking 3-D motion of dynamic objects using monocular visual-inertial sensing, IEEE Transactions on Robotics 35 (4) (2019) 799–816.
[51]
Ren S., He K., Girshick R., Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems 28 (2015).
[52]
Shahbazi M., Bayat M.H., Tarvirdizadeh B., A motion model based on recurrent neural networks for visual object tracking, Image and Vision Computing 126 (2022).
[53]
Tao, R., Gavves, E., & Smeulders, A. W. M. (2016). Siamese instance search for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1420–1429).
[54]
Triggs B., McLauchlan P.F., Hartley R.I., Fitzgibbon A.W., Bundle adjustment—A modern synthesis, in: International workshop on vision algorithms, Springer, 1999, pp. 298–372.
[55]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., et al., Attention is all you need, Advances in Neural Information Processing Systems 30 (2017).
[56]
Vo, M., Narasimhan, S. G., & Sheikh, Y. (2016). Spatiotemporal bundle adjustment for dynamic 3D reconstruction. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1710–1718).
[57]
Wang C.-C., Thorpe C., Thrun S., Hebert M., Durrant-Whyte H., Simultaneous localization, mapping and moving object tracking, International Journal of Robotics Research 26 (9) (2007) 889–916.
[58]
Wang M.-S., et al., 3D object pose estimation using stereo vision for object manipulation system, in: 2017 international conference on applied system innovation, IEEE, 2017, pp. 1532–1535.
[59]
Wu, Y., Lim, J., & Yang, M. -H. (2013). Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2411–2418).
[60]
Yan, B., Peng, H., Fu, J., Wang, D., & Lu, H. (2021). Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10448–10457).
[61]
Ye B., Chang H., Ma B., Shan S., Chen X., Joint feature learning and relation modeling for tracking: A one-stream framework, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, Springer, 2022, pp. 341–357.
[62]
Zhang Z., A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (11) (2000) 1330–1334.
[63]
Zheng, Y., & Kneip, L. (2016). A direct least-squares solution to the PnP problem with unknown focal length. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1790–1798).
[64]
Zou D., Tan P., Coslam: Collaborative visual slam in dynamic environments, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2) (2012) 354–366.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 229, Issue PA
Nov 2023
1358 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2023

Author Tags

  1. Visual object tracking
  2. Object tracking dataset
  3. Aerial robot
  4. Deep learning
  5. Visual–inertial navigation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Dec 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media