Long-term Visual Tracking: Review and Experimental Comparison

584 Accesses
12 Citations
2 Altmetric
Explore all metrics

Abstract

As a fundamental task in computer vision, visual object tracking has received much attention in recent years. Most studies focus on short-term visual tracking which addresses shorter videos and always-visible targets. However, long-term visual tracking is much closer to practical applications with more complicated challenges. There exists a longer duration such as minute-level or even hour-level in the long-term tracking task, and the task also needs to handle more frequent target disappearance and reappearance. In this paper, we provide a thorough review of long-term tracking, summarizing long-term tracking algorithms from two perspectives: framework architectures and utilization of intermediate tracking results. Then we provide a detailed description of existing benchmarks and corresponding evaluation protocols. Furthermore, we conduct extensive experiments and analyse the performance of trackers on six benchmarks: VOTLT2018, VOTLT2019 (2020/2021), OxUvA, LaSOT, TLP and the long-term subset of VTUAV-V. Finally, we discuss the future prospects from multiple perspectives, including algorithm design and benchmark construction. To our knowledge, this is the first comprehensive survey for long-term visual object tracking. The relevant content is available at https://github.com/wangdong-dut/Long-term-Visual-Tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Long-Term Visual Object Tracking Benchmark

The Eighth Visual Object Tracking VOT2020 Challenge Results

The Sixth Visual Object Tracking VOT2018 Challenge Results

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

M. Mueller, N. Smith, B. Ghanem. A benchmark and simulator for UAV tracking. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp.445–461, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_27.
Google Scholar
A. Moudgil, V. Gandhi. Long-term visual object tracking benchmark. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 629–645, 2019. DOI: https://doi.org/10.1007/978-3-030-20890-5_40.
Google Scholar
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. Now you see me: Evaluating performance in long-term visual tracking. [Online], Available: https://arxiv.org/abs/1804.07056, 2018.
Z. Kalal, K. Mikolajczyk, J. Matas. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409–1422, 2012. DOI: https://doi.org/10.1109/TPAMI.2011.239.
Article Google Scholar
J. Valmadre, L. Bertinetto, J. F. Henriques, R. Tao, A. Vedaldi, A. W. M. Smeulders, P. H. S. Torr, E. Gavves. Long-term tracking in the wild: A benchmark. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 692–707, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_41.
Google Scholar
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. Performance evaluation methodology for long-term visual object tracking. [Online], Available: https://arxiv.org/abs/1906.08675, 2019.
Y. H. Zhang, L. J. Wang, D. Wang, J. Q. Qi, H. C. Lu. Learning regression and verification networks for robust long-term tracking. International Journal of Computer Vision, vol. 129, no. 9, pp. 2536–2547, 2021. DOI: https://doi.org/10.1007/s11263-021-01487-3.
Article Google Scholar
B. Yan, H. J. Zhao, D. Wang, H. C. Lu, X. Y. Yang. ‘Skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 2385–2393, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00247.
Google Scholar
K. N. Dai, Y. H. Zhang, D. Wang, J. H. Li, H. C. Lu, X. Y. Yang. High-performance long-term tracking with meta-updater. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.6297–6306, 2020. DOI: https://doi.org/10.1109//CVPR42600.2020.00633.
Google Scholar
C. Mayer, M. Danelljan, D. P. Paudel, L. Van Gool. Learning target candidate association to keep track of what not to track. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 13424–13434, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.01319.
Google Scholar
P. Voigtlaender, J. Luiten, P. H. S. Torr, B. Leibe. Siam R-CNN: Visual tracking by re-detection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6577–6587, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00661.
Google Scholar
X. Q. Zhang, R. H. Jiang, C. X. Fan, T. Y. Tong, T. Wang, P. C. Huang. Advances in deep learning methods for visual tracking: Literature review and fundamentals. International Journal of Automation and Computing, vol. 18, no. 3, pp. 311–333, 2021. DOI: https://doi.org/10.1007/s11633-020-1274-8.
Article Google Scholar
P. X. Li, D. Wang, L. J. Wang, H. C. Lu. Deep visual tracking: Review and experimental comparison. Pattern Recognition, vol. 76, pp. 323–338, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.11.007.
Article Google Scholar
S. M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, S. Kasaei. Deep learning for visual tracking: A comprehensive survey. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 5, pp. 3943–3968, 2022. DOI: https://doi.org/10.1109/TITS.2020.3046478.
Article Google Scholar
D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui. Visual object tracking using adaptive correlation filters. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 2544–2550, 2010. DOI: https://doi.org/10.1109/CVPR.2010.5539960.
Google Scholar
J. F. Henriques, R. Caseiro, P. Martins, J. Batista. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 702–715, 2012. DOI: https://doi.org/10.1007/978-3-642-33765-9_50.
Google Scholar
J. F. Henriques, R. Caseiro, P. Martins, J. Batista. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015. DOI: https://doi.org/10.1109/TPAMI.2014.2345390.
Article Google Scholar
Y. Li, J. K. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 254–265, 2015. DOI: https://doi.org/10.1007/978-3-319-16181-5_18.
Google Scholar
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference, BMVA Press, Nottingham, UK, pp. 1–11, 2014. DOI: https://doi.org/10.5244/C.28.65.
Google Scholar
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 4310–4318, 2015. DOI: https://doi.org/10.1109/ICCV.2015.490.
M. Danelljan, A. Robinson, F. S. Khan, M. Felsberg. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 472–488, 2016. DOI: https://doi.org/10.1007/978-3-319-46454-1_29.
Google Scholar
M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg. ECO: Efficient convolution operators for tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6931–6939, 2017. DOI: https://doi.org/10.1109/CVPR2017.733
R. Tao, E. Gavves, A. W. M. Smeulders. Siamese instance search for tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1420–1429, 2016. DOI: https://doi.org/10.1109/CVPR.2016.158.
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. S. Torr. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp.850–865, 2016. DOI: https://doi.org/10.1007/978-3-319-48881-3_56.
B. Li, J. J. Yan, W. Wu, Z. Zhu, X. L. Hu. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp.8971–8980, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00935.
Y. D. Xu, Z. Y. Wang, Z. X. Li, Y. Yuan, G. Yu. Siam-FC++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 12549–12556, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6944.
Z. P. Zhang, H. W. Peng, J. L. Fu, B. Li, W. M. Hu. Ocean: Object-aware anchor-free tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 771–787, 2020. DOI: https://doi.org/10.1007/978-3-030-58589-1_46.
Google Scholar
Z. D. Chen, B. N. Zhong, G. R. Li, S. P. Zhang, R. R. Ji. Siamese box adaptive network for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6667–6676, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00670.
Google Scholar
D. Y. Guo, J. Wang, Y. Cui, Z. H. Wang, S. Y. Chen. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6268–6276, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00630.
Google Scholar
Z. Zhu, Q. Wang, B. Li, W. Wu, J. J. Yan, W. M. Hu. Distractor-aware siamese networks for visual object tracking. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 103–119, 2018. DOI: https://doi.org/10.1007/978-3-030-01240-3_7.
Google Scholar
B. Li, W. Wu, Q. Wang, F. Y. Zhang, J. L. Xing, J. J. Yan. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4277–4286, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00441.
Google Scholar
Z. P. Zhang, H. W. Peng. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4586–4595, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00472.
Google Scholar
M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg. ATOM: Accurate tracking by overlap maximization. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4655–4664, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00479.
Google Scholar
G. Bhat, M. Danelljan, L. Van Gool, R. Timofte. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Long Beach, USA, pp. 6181–6190, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00628.
Google Scholar
M. Danelljan, L. Van Gool, R. Timofte. Probabilistic regression for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 7181–7190, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00721.
Google Scholar
G. Bhat, M. Danelljan, L. Van Gool, R. Timofte. Know your surroundings: Exploiting scene information for object tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 205–221, 2020. DOI: https://doi.org/10.1007/978-3-030-58592-1_13.
Google Scholar
X. Chen, B. Yan, J. W. Zhu, D. Wang, X. Y. Yang, H. C. Lu. Transformer tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 8122–8131, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00803.
Google Scholar
B. Yan, H. W. Peng, J. L. Fu, D. Wang, H. C. Lu. Learning spatio-temporal transformer for visual tracking. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 10428–10437, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.01028.
Google Scholar
S. Karthik, A. Moudgil, V. Gandhi. Exploring 3 R’s of long-term tracking: Re-detection, recovery and reliability. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 1000–1009, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093465.
T. P. Kuipers, D. Arya, D. K. Gupta. Hard occlusions in visual object tracking. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 299–314, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_22.
Google Scholar
A. Lukezic, U. Kart, J. Käpylä, A. Durmush, J. K. Kamarainen, J. Matas, M. Kristan. CDTB: A color and depth visual object tracking dataset and benchmark. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 10012–10021, 2019. DOI: https://doi.org/10.1109/ICCV.2019.01011.
Google Scholar
Y. L. Qian, S. Yan, A. Lukežič, M. Kristan, J. K. Kämäräinen, J. Matas. DAL: A deep depth-aware long-term tracker. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 7825–7832, 2021. DOI
Google Scholar
U. Kart, A. Lukežič, M. Kristan, J. K. Kämäräinen, J. Matas. Object tracking by reconstruction with view-specific discriminative correlation filters. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1339–1348, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00143.
Google Scholar
G. Nebehay, R. Pflugfelder. Clustering of static-adaptive correspondences for deformable object tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 2784–2791, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298895.
Y. Hua, K. Alahari, C. Schmid. Occlusion and motion reasoning for long-term tracking. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 172–187, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_12.
Google Scholar
C. Ma, X. K. Yang, C. Y. Zhang, M. H. Yang. Long-term correlation tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 5388–5396, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7299177.
N. Wang, W. G. Zhou, H. Q. Li. Reliable re-detection for long-term tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 3, pp. 730–743, 2019. DOI: https://doi.org/10.1109/TCSVT.2018.2816570.
Article Google Scholar
L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, P. H. S. Torr. Staple: Complementary learners for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1401–1409, 2016. DOI: https://doi.org/10.1109/CVPR.2016.156.
H. Fan, H. B. Ling. Parallel tracking and verifying. IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 4130–4144, 2019. DOI: https://doi.org/10.1109/TIP.2019.2904789.
Article MATH Google Scholar
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 8, pp. 1561–1575, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2609928.
Article Google Scholar
Z. B. Hong, Z. Chen, C. H. Wang, X. Mei, D. Prokhorov, D. C. Tao. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 749–758, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298675.
N. X. Liang, G. L. Wu, W. X. Kang, Z. Y. Wang, D. D. Feng. Real-time long-term tracking with prediction-detection-correction. IEEE Transactions on Multimedia, vol. 20, no. 9, pp. 2289–2302, 2018. DOI: https://doi.org/10.1109/TMM.2018.2803518.
Article Google Scholar
J. W. Liao, C. Qi, J. Z. Cao, L. Ren, G. P. Zhang. Real-time long-term tracker with tracking-verification-detection-refinement. Journal of Visual Communication and Image Representation, vol. 72, Article number 102896, 2020. DOI: https://doi.org/10.1016/j.jvcir.2020.102896.
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. FuCoLoT-a fully-correlational long-term tracker. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 595–611, 2019. DOI: https://doi.org/10.1007/978-3-030-20890-5_38.
Google Scholar
A. Lukežic, T. Vojír, L. C. Zajc, J. Matas, M. Kristan. Discriminative correlation filter with channel and spatial reliability. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4847–4856, 2017. DOI: https://doi.org/10.1109/CVPR.2017.515.
Z. P. Wang, H. Wang, B. F. Fang, C. J. Xie. Support vector correlation filter with long-term tracking. Signal, Image and Video Processing, vol. 12, no. 8, pp. 1541–1549, 2018. DOI: https://doi.org/10.1007/s11760-018-1310-0.
Article Google Scholar
F. Tang, Q. Ling. Contour-aware long-term tracking with reliable re-detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4739–4754, 2020. DOI: https://doi.org/10.1109/TCSVT.2019.2957748.
Article Google Scholar
H. Lee, S. Choi, C. Kim. A memory model based on the siamese network for long-term tracking. In Proceedings of the European Conference on Computer Vision Workshops, Springer, Munich, Germany, pp. 100–115, 2019. DOI: https://doi.org/10.1007/978-3-030-11009-3_5.
Google Scholar
E. Gavves, R. Tao, D. K. Gupta, A. W. M. Smeulders. Model decay in long-term tracking. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 2685–2692, 2021. DOI: https://doi.org/10.1109/ICPR48806.2021.9412648.
Google Scholar
A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.
H. Nam, B. Han. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 4293–4302, 2016. DOI: https://doi.org/10.1109/CVPR.2016.465.
H. Wu, X. Y. Yang, Y. Yang, G. Z. Liu. Flow guided short-term trackers with cascade detection for long-term tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 170–178, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00026.
Google Scholar
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. [Online], Available: https://arxiv.org/abs/1409.1556, 2014.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Č. Zajc, T. Vojir, G. Bhat, A. Lukežič, A. Eldesokey, G. Fernández, Á. García-Martín, Á. Iglesias-Arias, A. A. Alatan, A. González-García, A. Petrosino, A. Memarmoghadam, A. Vedaldi, A. Muhič, A. F. He, A. Smeulders, A. G. Perera, B. Li, B. Y. Chen, C. Kim, C. S. Xu, C. Z. Xiong, C. Tian, C. Luo, C. Sun, C. Hao, D. Kim, D. Mishra, D. M. Chen, D. Wang, D. Wee, E. Gavves, E. Gundogdu, E. Velasco-Salido, F. S. Khan, F. Yang, F. Zhao, F. Li, F. Battistone, G. De Ath, G. R. K. S. Subrahmanyam, G. Bastos, H. B. Ling, H. K. Galoogahi, H. Lee, H. J. Li, H. J. Zhao, H. Fan, H. G. Zhang, H. Possegger, H. Q. Li, H. C. Lu, H. Zhi, H. Y. Li, H. Lee, H. J. Chang, I. Drummond, J. Valmadre, J. S. Martin, J. Chahl, J. Y. Choi, J. Li, J. Q. Wang, J. Q. Qi, J. Sung, J. Johnander, J. Henriques, J. Choi, J. Van De weijer, J. R. Herranz, J. M. Martínez, J. Kittler, J. F. Zhuang, J. Y. Gao, K. Grm, L. C. Zhang, L. J. Wang, L. X. Yang, L. Rout, L. Si, L. Bertinetto, L. T. Chu, M. Q. Che, M. E. Maresca, M. Danelljan, M. H. Yang, M. Abdelpakey, M. Shehata, M. Y. N. G. Kang, N. Lee, N. Wang, O. Miksik, P. Moallem, P. Vicente-Moñivar, P. Senna, P. X. Li, P. Torr, P. M. Raju, Q. Ruihe, Q. Wang, Q. Zhou, Q. Guo, R. Martín-Nieto, R. K. Gorthi, R. Tao, R. Bowden, R. Everson, R. L. Wang, S. Yun, S. Choi, S. Vivas, S. Bai, S. P. Huang, S. H. Wu, S. Hadfield, S. W. Wang, S. Golodetz, T. Ming, T. Y. Xu, T. Z. Zhang, T. Fischer, V. Santopietro, V. Štruc, W. Wei, W. M. Zuo, W. Feng, W. Wu, W. Zou, W. M. Hu, W. G. Zhou, W. J. Zeng, X. F. Zhang, X. H. Wu, X. J. Wu, X. M. Tian, Y. Li, Y. Lu, Y. W. Law, Y. Wu, Y. Demiris, Y. C. Yang, Y. F. Jiao, Y. H. Li, Y. H. Zhang, Y. X. Sun, Z. Zhang, Z. Zhu, Z. H. Feng, Z. H. Wang, Z. Q. He. The sixth visual object tracking VOT2018 challenge results. In Proceedings of the European Conference on Computer Vision Workshops, Springer, Munich, Germany, pp. 3–53, 2019. DOI: https://doi.org/10.1007/978-3-030-11009-3_1.
Google Scholar
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, L. C. Zajc, O. Drbohlav, A. Lukezic, A. Berg, A. Eldesokey, J. Käpylä, G. Fernández, A. Gonzalez-Garcia, A. Memarmoghadam, A. D. Lu, A. F. He, A. Varfolomieiev, A. Chan, A. S. Tripathi, A. Smeulders, B. S. Pedasingu, B. X. Chen, B. P. Zhang, B. Y. Wu, B. Li, B. He, B. Yan, B. Bai, B. Li, B. Li, B. H. Kim, C. Ma, C. Fang, C. Qian, C. Chen, C. L. Li, C. Q. Zhang, C. Y. Tsai, C. Luo, C. Micheloni, C. H. Zhang, D. C. Tao, D. Gupta, D. J. Song, D. Wang, E. Gavves, E. Yi, F. S. Khan, F. Y. Zhang, F. Wang, F. Zhao, G. De Ath, G. Bhat, G. Q. Chen, G. T. Li, H. Cevikalp, H. Du, H. J. Zhao, H. Saribas, H. M. Jung, H. L. Bai, H. Y. Yu, H. Y. Yu, H. W. Peng, H. C. Lu, H. Li, J. K. Li, J. H. Li, J. L. Fu, J. Chen, G. Gao, J. Zhao, J. Tang, J. Li, J. J. Wu, J. T. Liu, J. Q. Wang, J. Q. Qi, J. Y. Zhang, J. K. Tsotsos, J. H. Lee, J. van de Weijer, J. Kittler, J. H. Lee, J. F. Zhuang, K. K. Zhang, K. K. Wang, K. N. Dai, L. Chen, L. Liu, L. D. Guo, L. Zhang, L. Wang, L. L. Wang, L. C. Zhang, L. J. Wang, L. J. Zhou, L. Y. Zheng, L. T. Rout, L. Van Gool, L. Bertinetto, M. Danelljan, M. Dunnhofer, M. Ni, M. Y. Kim, M. Tang, M. H. Yang, N. Paluru, N. Martinel, P. F. Xu, P. F. Zhang, P. K. Zheng, P. Y. Zhang, P. H. S. Torr, Q. Z. Q. Wang, Q. Guo, R. Timofte, R. K. Gorthi, R. Everson, R. Z. Han, R. H. Zhang, S. You, S. C. Zhao, S. W. Zhao, S. H. Li, S. K. Li, S. M. Ge, S. Bai, S. S. Guan, T. F. Xing, T. Y. Xu, T. Y. Yang, T. Zhang, T. Vojir, W. Feng, W. M. Hu, W. Z. Wang, W. J. Tang, W. J. Zeng, W. Y. Liu, X. Chen, X. Qiu, X. Bai, X. J. Wu, X. Y. Yang, X. E. Chen, X. Li, X. Sun, X. Y. Chen, X. M. Tian, X. Tang, X. F. Zhu, Y. Huang, Y. N. Chen, Y. C. Lian, Y. Gu, Y. Liu, Y. J. Chen, Y. Zhang, Y. D. Xu, Y. M. Wang, Y. P. Li, Y. Zhou, Y. Dong, Y. F. Xu, Y. H. Zhang, Y. K. Li, Z. W. Z. Luo, Z. L. Zhang, Z. H. Feng, Z. Y. He, Z. C. Song, Z. H. Chen, Z. P. Zhang, Z. R. Wu, Z. W. Xiong, Z. J. Huang, Z. Teng, Z. H. Ni. The seventh visual object tracking VOT2019 challenge results. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 2206–2241, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00276.
Google Scholar
M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, M. Danelljan, L. Č. Zajc, A. Lukežič, O. Drbohlav, L. B. He, Y. S. Zhang, S. Yan, J. Y. Yang, G. Fernández, A. Hauptmann, A. Memarmoghadam, Á. García-Martín, A. Robinson, A. Varfolomieiev, A. H. Gebrehiwot, B. Uzun, B. Yan, B. Li, C. Qian, C. Y. Tsai, C. Micheloni, D. Wang, F. Wang, F. Xie, F. J. Lawin, F. Gustafsson, G. L. Foresti, G. Bhat, G. Q. Chen, H. B. Ling, H. T. Zhang, H. Cevikalp, H. J. Zhao, H. R. Bai, H. C. Kuchibhotla, H. Saribas, H. Fan, H. Ghanei-Yakhdan, H. Q. Li, H. W. Peng, H. C. Lu, H. Li, J. Khaghani, J. Bescos, J. H. Li, J. L. Fu, J. Q. Yu, J. T. Xu, J. Kittler, J. Yin, J. Lee, K. C. Yu, K. W. Liu, K. Yang, K. N. Dai, L. Cheng, L. Zhang, L. J. Wang, L. Y. Wang, L. Van Gool, L. Bertinetto, M. Dunnhofer, M. Cheng, M. M. Dasari, N. Wang, N. Wang, P. Y. Zhang, P. H. S. Torr, Q. Wang, R. Timofte, R. K. S. Gorthi, S. Choi, S. M. Marvasti-Zadeh, S. C. Zhao, S. Kasaei, S. M. Qiu, S. H. Chen, T. B. Schön, T. Y. Xu, W. Lu, W. M. Hu, W. G. Zhou, X. Qiu, X. Ke, X. J. Wu, X. L. Zhang, X. Y. Yang, X. F. Zhu, Y. J. Jiang, Y. M. Wang, Y. W. Chen, Y. Ye, Y. Z. Li, Y. Yao, Y. Lee, Y. Z. Gu, Z. Z. Wang, Z. Y. Tang, Z. H. Feng, Z. J. Mai, Z. P. Zhang, Z. R. Wu, Z. A. Ma. The eighth visual object tracking VOT2020 challenge results. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 547–601, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_39.
Google Scholar
Q. Wang, L. Zhang, L. Bertinetto, W. M. Hu, P. H. S. Torr. Fast online object tracking and segmentation: A unifying approach. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1328–1338, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00142.
Google Scholar
W. H. Zhang, H. R. Wang, Z. J. Huang, Y. X. Li, J. L. Zhou, L. C. Jiao. Accuracy and long-term tracking via overlap maximization integrated with motion continuity. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 109–117, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00019.
Google Scholar
S. Choi, J. Lee, Y. S. Lee, A. Hauptmann. Robust long-term object tracking via improved discriminative model prediction. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 602–617, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_40.
Google Scholar
G. Zhu, F. Porikli, H. D. Li. Beyond local search: Tracking objects everywhere with instance-specific proposals. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 943–951, 2016. DOI: https://doi.org/10.1109/CVPR.2016.108.
C. L. Zitnick, P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp.391–405, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_26.
H. Liu, Q. Y. Hu, B. Li, Y. L. Guo. Robust long-term tracking via instance-specific proposals. IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 4, pp. 950–962, 2020. DOI: https://doi.org/10.1109/TIM.2019.2908715.
Article Google Scholar
D. Q. Sun, X. D. Yang, M. Y. Liu, J. Kautz. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp.8934–8943, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00931.
J. Q. Wang, K. Chen, S. Yang, C. C. Loy, D. H. Lin. Region proposal by guided anchoring, In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp.2960–2969, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00308.
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.
Article Google Scholar
I. Jung, J. Son, M. Baek, B. Han. Real-time MDNet. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 89–104, 2018. DOI: https://doi.org/10.1007/978-3-030-01225-0_6.
Google Scholar
M. E. Maresca, A. Petrosino. MATRIOSKA: A multi-level approach to fast tracking by learning. In Proceedings of the International Conference on Image Analysis and Processing, Springer, Naples, Italy, pp. 419–428, 2013. DOI: https://doi.org/10.1007/978-3-642-41184-7_43.
Google Scholar
J. S. Supancic III, D. Ramanan. Self-paced learning for long-term tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, pp. 2379–2386, 2013. DOI: https://doi.org/10.1109/CVPR.2013.308.
A. Dave, P. Tokmakov, C. Schmid, D. Ramanan. Learning to track any object. [Online], Available: https://arxiv.org/abs/1910.11844, 2019.
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020. DOI: https://doi.org/10.1109/TPAMI.2018.2844175.
Article Google Scholar
Z. K. Zhang, B. N. Zhong, S. P. Zhang, Z. J. Tang, X. Liu, Z. X. Zhang. Distractor-aware fast tracking via dynamic convolutions and MOT philosophy. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 1024–1033, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00108.
Google Scholar
L. H. Huang, X. Zhao, K. Q. Huang. GlobalTrack: A simple and strong baseline for long-term tracking. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 11037–11044, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6758.
J. Choi, J. Kwon, K. M. Lee. Visual tracking by Trident Align and context embedding. In Proceedings of the 15th Asian Conference on Computer Vision, Springer, Kyoto, Japan, pp. 504–520, 2021. DOI: https://doi.org/10.1007/978-3-030-69532-3_31.
Google Scholar
Z. B. Li, Q. Wang, J. Gao, B. Li, W. M. Hu. Globally spatial-temporal perception: A long-term tracking system. In Proceedings of IEEE International Conference on Image Processing, Abu Dhabi, UAE, pp. 2066–2070, 2020. DOI: https://doi.org/10.1109/ICIP40778.2020.9191319.
X. Wang, Z. Chen, J. Tang, B. Luo, Y. W. Wang, Y. H. Tian, F. Wu. Dynamic attention guided multi-trajectory analysis for single object tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 12, pp. 4895–4908, 2021. DOI: https://doi.org/10.1109/TCSVT.2021.3056684.
Article Google Scholar
Y. Wu, J. Lim, M. H. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834–1848, 2015. DOI: https://doi.org/10.1109/TPAMI.2014.2388226.
Article Google Scholar
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, H. J. Chang, M. Danelljan, L. Č. Zajc, A. Lukežič, O. Drbohlav, J. Käpylä, G. Häger, S. Yan, J. Y. Yang, Z. Q. Zhang, G. Fernández, M. Abdelpakey, G. Bhat, L. Cerkezi, H. Cevikalp, S. Y. Chen, X. Chen, M. Cheng, Z. Y. Cheng, Y. C. Chiu, O. Cirakman, Y. T. Cui, K. N. Dai, M. M. Dasari, Q. Deng, X. P. Dong, D. K. Du, M. Dunnhofer, Z. H. Feng, Z. Y. Feng, Z. H. Fu, S. M. Ge, R. K. Gorthi, Y. Z. Gu, B. Gunsel, Q. Guo, F. Gurkan, W. C. Han, Y. Y. Huang, F. J. Lawin, S. J. Jhang, R. G. Ji, C. Jiang, Y. J. Jiang, F. Juefei-Xu, Y. Jun, X. Ke, F. S. Khan, B. H. Kim, J. Kittler, X. Y. Lan, J. H. Lee, B. Leibe, H. Li, J. H. Li, X. X. Li, Y. Z. Li, B. Liu, C. Liu, J. G. Liu, L. Liu, Q. J. Liu, H. C. Lu, W. Lu, J. Luiten, J. Ma, Z. Ma, N. Martinel, C. Mayer, A. Memarmoghadam, C. Micheloni, Y. Z. Niu, D. Paudel, H. W. Peng, S. M. Qiu, A. Rajiv, M. Rana, A. Robinson, H. Saribas, L. Shao, M. Shehata, F. Shen, J. B. Shen, K. Simonato, X. N. Song, Z. Y. Tang, R. Timofte, P. Torr, C. Y. Tsai, B. Uzun, L. Van Gool, P. Voigtlaender, D. Wang, G. T. Wang, L. L. Wang, L. J. Wang, L. M. Wang, L. Y. Wang, Y. Wang, Y. H. Wang, C. Y. Wu, G. S. Wu, X. J. Wu, F. Xie, T. Y. Xu, X. Xu, W. L. Xue, B. Yan, W. K. Yang, X. Y. Yang, Y. Ye, J. Yin, C. W. Zhang, C. H. Zhang, H. T. Zhang, K. H. Zhang, K. K. Zhang, X. H. Zhang, X. L. Zhang, X. Y. Zhang, Z. B. Zhang, S. C. Zhao, M. Zhen, B. N. Zhong, J. W. Zhu, X. F. Zhu. The ninth visual object tracking VOT2021 challenge results. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 2711–2738, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00305.
Google Scholar
H. Fan, L. T. Lin, F. Yang, P. Chu, G. Deng, S. J. Yu, H. X. Bai, Y. Xu, C. Y. Liao, H. B. Ling. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5369–5378, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00552.
Google Scholar
M. Müller, A. Bibi, S. Giancola, S. Alsubaihi, B. Ghanem. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 310–327, 2018. DOI: https://doi.org/10.1007/978-3-030-01246-5_19.
Google Scholar
P. Y. Zhang, J. Zhao, D. Wang, H. C. Lu, X. Ruan. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022.
T. Y. Yang, A. B. Chan. Learning dynamic memory networks for object tracking. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 153–169, 2018. DOI: https://doi.org/10.1007/978-3-030-01240-3_10.
Google Scholar
Z. D. Wang, H. S. Zhao, Y. L. Li, S. J. Wang, P. H. S. Torr, L. Bertinetto. Do different tracking tasks require different appearance models? In Proceedings of the 35th Conference on Neural Information Processing Systems, pp. 726–738, 2021.
A. Bewley, Z. Y. Ge, L. Ott, F. Ramos, B. Upcroft. Simple online and realtime tracking. In Proceedings of IEEE International Conference on Image Processing, Phoenix, USA, pp. 3464–3468, 2016. DOI: https://doi.org/10.1109/ICIP.2016.7533003.
N. Wojke, A. Bewley, D. Paulus. Simple online and real-time tracking with a deep association metric. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 3645–3649, 2017. DOI: https://doi.org/10.1109/ICIP.2017.8296962.
Y. F. Zhang, C. Y. Wang, X. G. Wang, W. J. Zeng, W. Y. Liu. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, 2021. DOI: https://doi.org/10.1007/s11263-021-01513-4.
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 62176041 and 62022021), Joint Fund of Ministry of Education for Equipment Pre-research, China (No. 8091B032155), the Science and Technology Innovation Foundation of Dalian, China (No. 2020 JJ26GX036), and the Fundamental Research Funds for the Central Universities, China (No. DUT21LAB127).

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Chang Liu, Xiao-Fan Chen, Chun-Juan Bo & Dong Wang
Dalian Minzu University, Dalian, 116600, China
Chun-Juan Bo

Authors

Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Fan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Juan Bo
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Wang.

Additional information

Chang Liu received the B. Eng. degree in communication engineering from Dalian University of Technology, China in 2019. She is currently a Ph. D. candidate in signal and information processing at School of Information and Communication Engineering, Dalian University of Technology, China.

Her research direction is visual object tracking.

Xiao-Fan Chen received the B. Eng. degree in computer science from Dalian University of Technology, China in 2017. She is currently a master student in signal and information processing at School of Information and Communication Engineering, Dalian University of Technology, China.

Her research direction is visual object tracking.

Chun-Juan Bo received the Ph. D. degree in signal and information processing from Dalian University of Technology, China in 2019. She is currently an associate professor with College of Information and Communication Engineering, Dalian Minzu University, China.

Her research interests include image classification and object tracking.

Dong Wang received the B. Eng. degree in electronic information engineering and the Ph. D. degree in signal and information processing from Dalian University of Technology (DUT), China in 2008 and 2013, respectively. He is currently a full professor with School of Information and Communication Engineering, DUT, China.

His research interests focuses on object detection and tracking.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Chen, XF., Bo, CJ. et al. Long-term Visual Tracking: Review and Experimental Comparison. Mach. Intell. Res. 19, 512–530 (2022). https://doi.org/10.1007/s11633-022-1344-1

Download citation

Received: 27 January 2022
Accepted: 06 June 2022
Published: 07 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11633-022-1344-1

Long-term Visual Tracking: Review and Experimental Comparison

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Long-Term Visual Object Tracking Benchmark

The Eighth Visual Object Tracking VOT2020 Challenge Results

The Sixth Visual Object Tracking VOT2018 Challenge Results

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Long-term Visual Tracking: Review and Experimental Comparison

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Long-Term Visual Object Tracking Benchmark

The Eighth Visual Object Tracking VOT2020 Challenge Results

The Sixth Visual Object Tracking VOT2018 Challenge Results

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation