Abstract
As a fundamental task in computer vision, visual object tracking has received much attention in recent years. Most studies focus on short-term visual tracking which addresses shorter videos and always-visible targets. However, long-term visual tracking is much closer to practical applications with more complicated challenges. There exists a longer duration such as minute-level or even hour-level in the long-term tracking task, and the task also needs to handle more frequent target disappearance and reappearance. In this paper, we provide a thorough review of long-term tracking, summarizing long-term tracking algorithms from two perspectives: framework architectures and utilization of intermediate tracking results. Then we provide a detailed description of existing benchmarks and corresponding evaluation protocols. Furthermore, we conduct extensive experiments and analyse the performance of trackers on six benchmarks: VOTLT2018, VOTLT2019 (2020/2021), OxUvA, LaSOT, TLP and the long-term subset of VTUAV-V. Finally, we discuss the future prospects from multiple perspectives, including algorithm design and benchmark construction. To our knowledge, this is the first comprehensive survey for long-term visual object tracking. The relevant content is available at https://github.com/wangdong-dut/Long-term-Visual-Tracking.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
M. Mueller, N. Smith, B. Ghanem. A benchmark and simulator for UAV tracking. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp.445–461, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_27.
A. Moudgil, V. Gandhi. Long-term visual object tracking benchmark. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 629–645, 2019. DOI: https://doi.org/10.1007/978-3-030-20890-5_40.
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. Now you see me: Evaluating performance in long-term visual tracking. [Online], Available: https://arxiv.org/abs/1804.07056, 2018.
Z. Kalal, K. Mikolajczyk, J. Matas. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409–1422, 2012. DOI: https://doi.org/10.1109/TPAMI.2011.239.
J. Valmadre, L. Bertinetto, J. F. Henriques, R. Tao, A. Vedaldi, A. W. M. Smeulders, P. H. S. Torr, E. Gavves. Long-term tracking in the wild: A benchmark. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 692–707, 2018. DOI: https://doi.org/10.1007/978-3-030-01219-9_41.
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. Performance evaluation methodology for long-term visual object tracking. [Online], Available: https://arxiv.org/abs/1906.08675, 2019.
Y. H. Zhang, L. J. Wang, D. Wang, J. Q. Qi, H. C. Lu. Learning regression and verification networks for robust long-term tracking. International Journal of Computer Vision, vol. 129, no. 9, pp. 2536–2547, 2021. DOI: https://doi.org/10.1007/s11263-021-01487-3.
B. Yan, H. J. Zhao, D. Wang, H. C. Lu, X. Y. Yang. ‘Skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 2385–2393, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00247.
K. N. Dai, Y. H. Zhang, D. Wang, J. H. Li, H. C. Lu, X. Y. Yang. High-performance long-term tracking with meta-updater. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp.6297–6306, 2020. DOI: https://doi.org/10.1109//CVPR42600.2020.00633.
C. Mayer, M. Danelljan, D. P. Paudel, L. Van Gool. Learning target candidate association to keep track of what not to track. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 13424–13434, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.01319.
P. Voigtlaender, J. Luiten, P. H. S. Torr, B. Leibe. Siam R-CNN: Visual tracking by re-detection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6577–6587, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00661.
X. Q. Zhang, R. H. Jiang, C. X. Fan, T. Y. Tong, T. Wang, P. C. Huang. Advances in deep learning methods for visual tracking: Literature review and fundamentals. International Journal of Automation and Computing, vol. 18, no. 3, pp. 311–333, 2021. DOI: https://doi.org/10.1007/s11633-020-1274-8.
P. X. Li, D. Wang, L. J. Wang, H. C. Lu. Deep visual tracking: Review and experimental comparison. Pattern Recognition, vol. 76, pp. 323–338, 2018. DOI: https://doi.org/10.1016/j.patcog.2017.11.007.
S. M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, S. Kasaei. Deep learning for visual tracking: A comprehensive survey. IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 5, pp. 3943–3968, 2022. DOI: https://doi.org/10.1109/TITS.2020.3046478.
D. S. Bolme, J. R. Beveridge, B. A. Draper, Y. M. Lui. Visual object tracking using adaptive correlation filters. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 2544–2550, 2010. DOI: https://doi.org/10.1109/CVPR.2010.5539960.
J. F. Henriques, R. Caseiro, P. Martins, J. Batista. Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 702–715, 2012. DOI: https://doi.org/10.1007/978-3-642-33765-9_50.
J. F. Henriques, R. Caseiro, P. Martins, J. Batista. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583–596, 2015. DOI: https://doi.org/10.1109/TPAMI.2014.2345390.
Y. Li, J. K. Zhu. A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 254–265, 2015. DOI: https://doi.org/10.1007/978-3-319-16181-5_18.
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Accurate scale estimation for robust visual tracking. In Proceedings of the British Machine Vision Conference, BMVA Press, Nottingham, UK, pp. 1–11, 2014. DOI: https://doi.org/10.5244/C.28.65.
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE International Conference on Computer Vision, Santiago, Chile, pp. 4310–4318, 2015. DOI: https://doi.org/10.1109/ICCV.2015.490.
M. Danelljan, A. Robinson, F. S. Khan, M. Felsberg. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 472–488, 2016. DOI: https://doi.org/10.1007/978-3-319-46454-1_29.
M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg. ECO: Efficient convolution operators for tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6931–6939, 2017. DOI: https://doi.org/10.1109/CVPR2017.733
R. Tao, E. Gavves, A. W. M. Smeulders. Siamese instance search for tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1420–1429, 2016. DOI: https://doi.org/10.1109/CVPR.2016.158.
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, P. H. S. Torr. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp.850–865, 2016. DOI: https://doi.org/10.1007/978-3-319-48881-3_56.
B. Li, J. J. Yan, W. Wu, Z. Zhu, X. L. Hu. High performance visual tracking with Siamese region proposal network. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp.8971–8980, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00935.
Y. D. Xu, Z. Y. Wang, Z. X. Li, Y. Yuan, G. Yu. Siam-FC++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 12549–12556, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6944.
Z. P. Zhang, H. W. Peng, J. L. Fu, B. Li, W. M. Hu. Ocean: Object-aware anchor-free tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 771–787, 2020. DOI: https://doi.org/10.1007/978-3-030-58589-1_46.
Z. D. Chen, B. N. Zhong, G. R. Li, S. P. Zhang, R. R. Ji. Siamese box adaptive network for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6667–6676, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00670.
D. Y. Guo, J. Wang, Y. Cui, Z. H. Wang, S. Y. Chen. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6268–6276, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00630.
Z. Zhu, Q. Wang, B. Li, W. Wu, J. J. Yan, W. M. Hu. Distractor-aware siamese networks for visual object tracking. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 103–119, 2018. DOI: https://doi.org/10.1007/978-3-030-01240-3_7.
B. Li, W. Wu, Q. Wang, F. Y. Zhang, J. L. Xing, J. J. Yan. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4277–4286, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00441.
Z. P. Zhang, H. W. Peng. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4586–4595, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00472.
M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg. ATOM: Accurate tracking by overlap maximization. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 4655–4664, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00479.
G. Bhat, M. Danelljan, L. Van Gool, R. Timofte. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, IEEE, Long Beach, USA, pp. 6181–6190, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00628.
M. Danelljan, L. Van Gool, R. Timofte. Probabilistic regression for visual tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 7181–7190, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00721.
G. Bhat, M. Danelljan, L. Van Gool, R. Timofte. Know your surroundings: Exploiting scene information for object tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 205–221, 2020. DOI: https://doi.org/10.1007/978-3-030-58592-1_13.
X. Chen, B. Yan, J. W. Zhu, D. Wang, X. Y. Yang, H. C. Lu. Transformer tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 8122–8131, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00803.
B. Yan, H. W. Peng, J. L. Fu, D. Wang, H. C. Lu. Learning spatio-temporal transformer for visual tracking. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 10428–10437, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.01028.
S. Karthik, A. Moudgil, V. Gandhi. Exploring 3 R’s of long-term tracking: Re-detection, recovery and reliability. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 1000–1009, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093465.
T. P. Kuipers, D. Arya, D. K. Gupta. Hard occlusions in visual object tracking. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 299–314, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_22.
A. Lukezic, U. Kart, J. Käpylä, A. Durmush, J. K. Kamarainen, J. Matas, M. Kristan. CDTB: A color and depth visual object tracking dataset and benchmark. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Korea, pp. 10012–10021, 2019. DOI: https://doi.org/10.1109/ICCV.2019.01011.
Y. L. Qian, S. Yan, A. Lukežič, M. Kristan, J. K. Kämäräinen, J. Matas. DAL: A deep depth-aware long-term tracker. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 7825–7832, 2021. DOI
U. Kart, A. Lukežič, M. Kristan, J. K. Kämäräinen, J. Matas. Object tracking by reconstruction with view-specific discriminative correlation filters. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1339–1348, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00143.
G. Nebehay, R. Pflugfelder. Clustering of static-adaptive correspondences for deformable object tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 2784–2791, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298895.
Y. Hua, K. Alahari, C. Schmid. Occlusion and motion reasoning for long-term tracking. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 172–187, 2014. DOI: https://doi.org/10.1007/978-3-319-10599-4_12.
C. Ma, X. K. Yang, C. Y. Zhang, M. H. Yang. Long-term correlation tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 5388–5396, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7299177.
N. Wang, W. G. Zhou, H. Q. Li. Reliable re-detection for long-term tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 3, pp. 730–743, 2019. DOI: https://doi.org/10.1109/TCSVT.2018.2816570.
L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, P. H. S. Torr. Staple: Complementary learners for real-time tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1401–1409, 2016. DOI: https://doi.org/10.1109/CVPR.2016.156.
H. Fan, H. B. Ling. Parallel tracking and verifying. IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 4130–4144, 2019. DOI: https://doi.org/10.1109/TIP.2019.2904789.
M. Danelljan, G. Häger, F. S. Khan, M. Felsberg. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 8, pp. 1561–1575, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2609928.
Z. B. Hong, Z. Chen, C. H. Wang, X. Mei, D. Prokhorov, D. C. Tao. Multi-store tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 749–758, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298675.
N. X. Liang, G. L. Wu, W. X. Kang, Z. Y. Wang, D. D. Feng. Real-time long-term tracking with prediction-detection-correction. IEEE Transactions on Multimedia, vol. 20, no. 9, pp. 2289–2302, 2018. DOI: https://doi.org/10.1109/TMM.2018.2803518.
J. W. Liao, C. Qi, J. Z. Cao, L. Ren, G. P. Zhang. Real-time long-term tracker with tracking-verification-detection-refinement. Journal of Visual Communication and Image Representation, vol. 72, Article number 102896, 2020. DOI: https://doi.org/10.1016/j.jvcir.2020.102896.
A. Lukežič, L. Č. Zajc, T. Vojíř, J. Matas, M. Kristan. FuCoLoT-a fully-correlational long-term tracker. In Proceedings of the 14th Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 595–611, 2019. DOI: https://doi.org/10.1007/978-3-030-20890-5_38.
A. Lukežic, T. Vojír, L. C. Zajc, J. Matas, M. Kristan. Discriminative correlation filter with channel and spatial reliability. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 4847–4856, 2017. DOI: https://doi.org/10.1109/CVPR.2017.515.
Z. P. Wang, H. Wang, B. F. Fang, C. J. Xie. Support vector correlation filter with long-term tracking. Signal, Image and Video Processing, vol. 12, no. 8, pp. 1541–1549, 2018. DOI: https://doi.org/10.1007/s11760-018-1310-0.
F. Tang, Q. Ling. Contour-aware long-term tracking with reliable re-detection. IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 12, pp. 4739–4754, 2020. DOI: https://doi.org/10.1109/TCSVT.2019.2957748.
H. Lee, S. Choi, C. Kim. A memory model based on the siamese network for long-term tracking. In Proceedings of the European Conference on Computer Vision Workshops, Springer, Munich, Germany, pp. 100–115, 2019. DOI: https://doi.org/10.1007/978-3-030-11009-3_5.
E. Gavves, R. Tao, D. K. Gupta, A. W. M. Smeulders. Model decay in long-term tracking. In Proceedings of the 25th International Conference on Pattern Recognition, IEEE, Milan, Italy, pp. 2685–2692, 2021. DOI: https://doi.org/10.1109/ICPR48806.2021.9412648.
A. G. Howard, M. L. Zhu, B. Chen, D. Kalenichenko, W. J. Wang, T. Weyand, M. Andreetto, H. Adam. MobileNets: Efficient convolutional neural networks for mobile vision applications. [Online], Available: https://arxiv.org/abs/1704.04861, 2017.
H. Nam, B. Han. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 4293–4302, 2016. DOI: https://doi.org/10.1109/CVPR.2016.465.
H. Wu, X. Y. Yang, Y. Yang, G. Z. Liu. Flow guided short-term trackers with cascade detection for long-term tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 170–178, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00026.
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. [Online], Available: https://arxiv.org/abs/1409.1556, 2014.
K. M. He, X. Y. Zhang, S. Q. Ren, J. Sun. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 770–778, 2016. DOI: https://doi.org/10.1109/CVPR.2016.90.
M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, L. Č. Zajc, T. Vojir, G. Bhat, A. Lukežič, A. Eldesokey, G. Fernández, Á. García-Martín, Á. Iglesias-Arias, A. A. Alatan, A. González-García, A. Petrosino, A. Memarmoghadam, A. Vedaldi, A. Muhič, A. F. He, A. Smeulders, A. G. Perera, B. Li, B. Y. Chen, C. Kim, C. S. Xu, C. Z. Xiong, C. Tian, C. Luo, C. Sun, C. Hao, D. Kim, D. Mishra, D. M. Chen, D. Wang, D. Wee, E. Gavves, E. Gundogdu, E. Velasco-Salido, F. S. Khan, F. Yang, F. Zhao, F. Li, F. Battistone, G. De Ath, G. R. K. S. Subrahmanyam, G. Bastos, H. B. Ling, H. K. Galoogahi, H. Lee, H. J. Li, H. J. Zhao, H. Fan, H. G. Zhang, H. Possegger, H. Q. Li, H. C. Lu, H. Zhi, H. Y. Li, H. Lee, H. J. Chang, I. Drummond, J. Valmadre, J. S. Martin, J. Chahl, J. Y. Choi, J. Li, J. Q. Wang, J. Q. Qi, J. Sung, J. Johnander, J. Henriques, J. Choi, J. Van De weijer, J. R. Herranz, J. M. Martínez, J. Kittler, J. F. Zhuang, J. Y. Gao, K. Grm, L. C. Zhang, L. J. Wang, L. X. Yang, L. Rout, L. Si, L. Bertinetto, L. T. Chu, M. Q. Che, M. E. Maresca, M. Danelljan, M. H. Yang, M. Abdelpakey, M. Shehata, M. Y. N. G. Kang, N. Lee, N. Wang, O. Miksik, P. Moallem, P. Vicente-Moñivar, P. Senna, P. X. Li, P. Torr, P. M. Raju, Q. Ruihe, Q. Wang, Q. Zhou, Q. Guo, R. Martín-Nieto, R. K. Gorthi, R. Tao, R. Bowden, R. Everson, R. L. Wang, S. Yun, S. Choi, S. Vivas, S. Bai, S. P. Huang, S. H. Wu, S. Hadfield, S. W. Wang, S. Golodetz, T. Ming, T. Y. Xu, T. Z. Zhang, T. Fischer, V. Santopietro, V. Štruc, W. Wei, W. M. Zuo, W. Feng, W. Wu, W. Zou, W. M. Hu, W. G. Zhou, W. J. Zeng, X. F. Zhang, X. H. Wu, X. J. Wu, X. M. Tian, Y. Li, Y. Lu, Y. W. Law, Y. Wu, Y. Demiris, Y. C. Yang, Y. F. Jiao, Y. H. Li, Y. H. Zhang, Y. X. Sun, Z. Zhang, Z. Zhu, Z. H. Feng, Z. H. Wang, Z. Q. He. The sixth visual object tracking VOT2018 challenge results. In Proceedings of the European Conference on Computer Vision Workshops, Springer, Munich, Germany, pp. 3–53, 2019. DOI: https://doi.org/10.1007/978-3-030-11009-3_1.
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, L. C. Zajc, O. Drbohlav, A. Lukezic, A. Berg, A. Eldesokey, J. Käpylä, G. Fernández, A. Gonzalez-Garcia, A. Memarmoghadam, A. D. Lu, A. F. He, A. Varfolomieiev, A. Chan, A. S. Tripathi, A. Smeulders, B. S. Pedasingu, B. X. Chen, B. P. Zhang, B. Y. Wu, B. Li, B. He, B. Yan, B. Bai, B. Li, B. Li, B. H. Kim, C. Ma, C. Fang, C. Qian, C. Chen, C. L. Li, C. Q. Zhang, C. Y. Tsai, C. Luo, C. Micheloni, C. H. Zhang, D. C. Tao, D. Gupta, D. J. Song, D. Wang, E. Gavves, E. Yi, F. S. Khan, F. Y. Zhang, F. Wang, F. Zhao, G. De Ath, G. Bhat, G. Q. Chen, G. T. Li, H. Cevikalp, H. Du, H. J. Zhao, H. Saribas, H. M. Jung, H. L. Bai, H. Y. Yu, H. Y. Yu, H. W. Peng, H. C. Lu, H. Li, J. K. Li, J. H. Li, J. L. Fu, J. Chen, G. Gao, J. Zhao, J. Tang, J. Li, J. J. Wu, J. T. Liu, J. Q. Wang, J. Q. Qi, J. Y. Zhang, J. K. Tsotsos, J. H. Lee, J. van de Weijer, J. Kittler, J. H. Lee, J. F. Zhuang, K. K. Zhang, K. K. Wang, K. N. Dai, L. Chen, L. Liu, L. D. Guo, L. Zhang, L. Wang, L. L. Wang, L. C. Zhang, L. J. Wang, L. J. Zhou, L. Y. Zheng, L. T. Rout, L. Van Gool, L. Bertinetto, M. Danelljan, M. Dunnhofer, M. Ni, M. Y. Kim, M. Tang, M. H. Yang, N. Paluru, N. Martinel, P. F. Xu, P. F. Zhang, P. K. Zheng, P. Y. Zhang, P. H. S. Torr, Q. Z. Q. Wang, Q. Guo, R. Timofte, R. K. Gorthi, R. Everson, R. Z. Han, R. H. Zhang, S. You, S. C. Zhao, S. W. Zhao, S. H. Li, S. K. Li, S. M. Ge, S. Bai, S. S. Guan, T. F. Xing, T. Y. Xu, T. Y. Yang, T. Zhang, T. Vojir, W. Feng, W. M. Hu, W. Z. Wang, W. J. Tang, W. J. Zeng, W. Y. Liu, X. Chen, X. Qiu, X. Bai, X. J. Wu, X. Y. Yang, X. E. Chen, X. Li, X. Sun, X. Y. Chen, X. M. Tian, X. Tang, X. F. Zhu, Y. Huang, Y. N. Chen, Y. C. Lian, Y. Gu, Y. Liu, Y. J. Chen, Y. Zhang, Y. D. Xu, Y. M. Wang, Y. P. Li, Y. Zhou, Y. Dong, Y. F. Xu, Y. H. Zhang, Y. K. Li, Z. W. Z. Luo, Z. L. Zhang, Z. H. Feng, Z. Y. He, Z. C. Song, Z. H. Chen, Z. P. Zhang, Z. R. Wu, Z. W. Xiong, Z. J. Huang, Z. Teng, Z. H. Ni. The seventh visual object tracking VOT2019 challenge results. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 2206–2241, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00276.
M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, M. Danelljan, L. Č. Zajc, A. Lukežič, O. Drbohlav, L. B. He, Y. S. Zhang, S. Yan, J. Y. Yang, G. Fernández, A. Hauptmann, A. Memarmoghadam, Á. García-Martín, A. Robinson, A. Varfolomieiev, A. H. Gebrehiwot, B. Uzun, B. Yan, B. Li, C. Qian, C. Y. Tsai, C. Micheloni, D. Wang, F. Wang, F. Xie, F. J. Lawin, F. Gustafsson, G. L. Foresti, G. Bhat, G. Q. Chen, H. B. Ling, H. T. Zhang, H. Cevikalp, H. J. Zhao, H. R. Bai, H. C. Kuchibhotla, H. Saribas, H. Fan, H. Ghanei-Yakhdan, H. Q. Li, H. W. Peng, H. C. Lu, H. Li, J. Khaghani, J. Bescos, J. H. Li, J. L. Fu, J. Q. Yu, J. T. Xu, J. Kittler, J. Yin, J. Lee, K. C. Yu, K. W. Liu, K. Yang, K. N. Dai, L. Cheng, L. Zhang, L. J. Wang, L. Y. Wang, L. Van Gool, L. Bertinetto, M. Dunnhofer, M. Cheng, M. M. Dasari, N. Wang, N. Wang, P. Y. Zhang, P. H. S. Torr, Q. Wang, R. Timofte, R. K. S. Gorthi, S. Choi, S. M. Marvasti-Zadeh, S. C. Zhao, S. Kasaei, S. M. Qiu, S. H. Chen, T. B. Schön, T. Y. Xu, W. Lu, W. M. Hu, W. G. Zhou, X. Qiu, X. Ke, X. J. Wu, X. L. Zhang, X. Y. Yang, X. F. Zhu, Y. J. Jiang, Y. M. Wang, Y. W. Chen, Y. Ye, Y. Z. Li, Y. Yao, Y. Lee, Y. Z. Gu, Z. Z. Wang, Z. Y. Tang, Z. H. Feng, Z. J. Mai, Z. P. Zhang, Z. R. Wu, Z. A. Ma. The eighth visual object tracking VOT2020 challenge results. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 547–601, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_39.
Q. Wang, L. Zhang, L. Bertinetto, W. M. Hu, P. H. S. Torr. Fast online object tracking and segmentation: A unifying approach. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 1328–1338, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00142.
W. H. Zhang, H. R. Wang, Z. J. Huang, Y. X. Li, J. L. Zhou, L. C. Jiao. Accuracy and long-term tracking via overlap maximization integrated with motion continuity. In Proceedings of IEEE/CVF International Conference on Computer Vision Workshops, IEEE, Seoul, Korea, pp. 109–117, 2019. DOI: https://doi.org/10.1109/ICCVW.2019.00019.
S. Choi, J. Lee, Y. S. Lee, A. Hauptmann. Robust long-term object tracking via improved discriminative model prediction. In Proceedings of the European Conference on Computer Vision, Springer, Glasgow, UK, pp. 602–617, 2020. DOI: https://doi.org/10.1007/978-3-030-68238-5_40.
G. Zhu, F. Porikli, H. D. Li. Beyond local search: Tracking objects everywhere with instance-specific proposals. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 943–951, 2016. DOI: https://doi.org/10.1109/CVPR.2016.108.
C. L. Zitnick, P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp.391–405, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_26.
H. Liu, Q. Y. Hu, B. Li, Y. L. Guo. Robust long-term tracking via instance-specific proposals. IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 4, pp. 950–962, 2020. DOI: https://doi.org/10.1109/TIM.2019.2908715.
D. Q. Sun, X. D. Yang, M. Y. Liu, J. Kautz. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp.8934–8943, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00931.
J. Q. Wang, K. Chen, S. Yang, C. C. Loy, D. H. Lin. Region proposal by guided anchoring, In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp.2960–2969, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00308.
S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031.
I. Jung, J. Son, M. Baek, B. Han. Real-time MDNet. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 89–104, 2018. DOI: https://doi.org/10.1007/978-3-030-01225-0_6.
M. E. Maresca, A. Petrosino. MATRIOSKA: A multi-level approach to fast tracking by learning. In Proceedings of the International Conference on Image Analysis and Processing, Springer, Naples, Italy, pp. 419–428, 2013. DOI: https://doi.org/10.1007/978-3-642-41184-7_43.
J. S. Supancic III, D. Ramanan. Self-paced learning for long-term tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, pp. 2379–2386, 2013. DOI: https://doi.org/10.1109/CVPR.2013.308.
A. Dave, P. Tokmakov, C. Schmid, D. Ramanan. Learning to track any object. [Online], Available: https://arxiv.org/abs/1910.11844, 2019.
K. M. He, G. Gkioxari, P. Dollár, R. Girshick. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020. DOI: https://doi.org/10.1109/TPAMI.2018.2844175.
Z. K. Zhang, B. N. Zhong, S. P. Zhang, Z. J. Tang, X. Liu, Z. X. Zhang. Distractor-aware fast tracking via dynamic convolutions and MOT philosophy. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, USA, pp. 1024–1033, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00108.
L. H. Huang, X. Zhao, K. Q. Huang. GlobalTrack: A simple and strong baseline for long-term tracking. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 11037–11044, 2020. DOI: https://doi.org/10.1609/aaai.v34i07.6758.
J. Choi, J. Kwon, K. M. Lee. Visual tracking by Trident Align and context embedding. In Proceedings of the 15th Asian Conference on Computer Vision, Springer, Kyoto, Japan, pp. 504–520, 2021. DOI: https://doi.org/10.1007/978-3-030-69532-3_31.
Z. B. Li, Q. Wang, J. Gao, B. Li, W. M. Hu. Globally spatial-temporal perception: A long-term tracking system. In Proceedings of IEEE International Conference on Image Processing, Abu Dhabi, UAE, pp. 2066–2070, 2020. DOI: https://doi.org/10.1109/ICIP40778.2020.9191319.
X. Wang, Z. Chen, J. Tang, B. Luo, Y. W. Wang, Y. H. Tian, F. Wu. Dynamic attention guided multi-trajectory analysis for single object tracking. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 12, pp. 4895–4908, 2021. DOI: https://doi.org/10.1109/TCSVT.2021.3056684.
Y. Wu, J. Lim, M. H. Yang. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834–1848, 2015. DOI: https://doi.org/10.1109/TPAMI.2014.2388226.
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kämäräinen, H. J. Chang, M. Danelljan, L. Č. Zajc, A. Lukežič, O. Drbohlav, J. Käpylä, G. Häger, S. Yan, J. Y. Yang, Z. Q. Zhang, G. Fernández, M. Abdelpakey, G. Bhat, L. Cerkezi, H. Cevikalp, S. Y. Chen, X. Chen, M. Cheng, Z. Y. Cheng, Y. C. Chiu, O. Cirakman, Y. T. Cui, K. N. Dai, M. M. Dasari, Q. Deng, X. P. Dong, D. K. Du, M. Dunnhofer, Z. H. Feng, Z. Y. Feng, Z. H. Fu, S. M. Ge, R. K. Gorthi, Y. Z. Gu, B. Gunsel, Q. Guo, F. Gurkan, W. C. Han, Y. Y. Huang, F. J. Lawin, S. J. Jhang, R. G. Ji, C. Jiang, Y. J. Jiang, F. Juefei-Xu, Y. Jun, X. Ke, F. S. Khan, B. H. Kim, J. Kittler, X. Y. Lan, J. H. Lee, B. Leibe, H. Li, J. H. Li, X. X. Li, Y. Z. Li, B. Liu, C. Liu, J. G. Liu, L. Liu, Q. J. Liu, H. C. Lu, W. Lu, J. Luiten, J. Ma, Z. Ma, N. Martinel, C. Mayer, A. Memarmoghadam, C. Micheloni, Y. Z. Niu, D. Paudel, H. W. Peng, S. M. Qiu, A. Rajiv, M. Rana, A. Robinson, H. Saribas, L. Shao, M. Shehata, F. Shen, J. B. Shen, K. Simonato, X. N. Song, Z. Y. Tang, R. Timofte, P. Torr, C. Y. Tsai, B. Uzun, L. Van Gool, P. Voigtlaender, D. Wang, G. T. Wang, L. L. Wang, L. J. Wang, L. M. Wang, L. Y. Wang, Y. Wang, Y. H. Wang, C. Y. Wu, G. S. Wu, X. J. Wu, F. Xie, T. Y. Xu, X. Xu, W. L. Xue, B. Yan, W. K. Yang, X. Y. Yang, Y. Ye, J. Yin, C. W. Zhang, C. H. Zhang, H. T. Zhang, K. H. Zhang, K. K. Zhang, X. H. Zhang, X. L. Zhang, X. Y. Zhang, Z. B. Zhang, S. C. Zhao, M. Zhen, B. N. Zhong, J. W. Zhu, X. F. Zhu. The ninth visual object tracking VOT2021 challenge results. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Montreal, Canada, pp. 2711–2738, 2021. DOI: https://doi.org/10.1109/ICCVW54120.2021.00305.
H. Fan, L. T. Lin, F. Yang, P. Chu, G. Deng, S. J. Yu, H. X. Bai, Y. Xu, C. Y. Liao, H. B. Ling. LaSOT: A high-quality benchmark for large-scale single object tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 5369–5378, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00552.
M. Müller, A. Bibi, S. Giancola, S. Alsubaihi, B. Ghanem. TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 310–327, 2018. DOI: https://doi.org/10.1007/978-3-030-01246-5_19.
P. Y. Zhang, J. Zhao, D. Wang, H. C. Lu, X. Ruan. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022.
T. Y. Yang, A. B. Chan. Learning dynamic memory networks for object tracking. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 153–169, 2018. DOI: https://doi.org/10.1007/978-3-030-01240-3_10.
Z. D. Wang, H. S. Zhao, Y. L. Li, S. J. Wang, P. H. S. Torr, L. Bertinetto. Do different tracking tasks require different appearance models? In Proceedings of the 35th Conference on Neural Information Processing Systems, pp. 726–738, 2021.
A. Bewley, Z. Y. Ge, L. Ott, F. Ramos, B. Upcroft. Simple online and realtime tracking. In Proceedings of IEEE International Conference on Image Processing, Phoenix, USA, pp. 3464–3468, 2016. DOI: https://doi.org/10.1109/ICIP.2016.7533003.
N. Wojke, A. Bewley, D. Paulus. Simple online and real-time tracking with a deep association metric. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 3645–3649, 2017. DOI: https://doi.org/10.1109/ICIP.2017.8296962.
Y. F. Zhang, C. Y. Wang, X. G. Wang, W. J. Zeng, W. Y. Liu. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, vol. 129, no. 11, pp. 3069–3087, 2021. DOI: https://doi.org/10.1007/s11263-021-01513-4.
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 62176041 and 62022021), Joint Fund of Ministry of Education for Equipment Pre-research, China (No. 8091B032155), the Science and Technology Innovation Foundation of Dalian, China (No. 2020 JJ26GX036), and the Fundamental Research Funds for the Central Universities, China (No. DUT21LAB127).
Author information
Authors and Affiliations
Corresponding author
Additional information
Chang Liu received the B. Eng. degree in communication engineering from Dalian University of Technology, China in 2019. She is currently a Ph. D. candidate in signal and information processing at School of Information and Communication Engineering, Dalian University of Technology, China.
Her research direction is visual object tracking.
Xiao-Fan Chen received the B. Eng. degree in computer science from Dalian University of Technology, China in 2017. She is currently a master student in signal and information processing at School of Information and Communication Engineering, Dalian University of Technology, China.
Her research direction is visual object tracking.
Chun-Juan Bo received the Ph. D. degree in signal and information processing from Dalian University of Technology, China in 2019. She is currently an associate professor with College of Information and Communication Engineering, Dalian Minzu University, China.
Her research interests include image classification and object tracking.
Dong Wang received the B. Eng. degree in electronic information engineering and the Ph. D. degree in signal and information processing from Dalian University of Technology (DUT), China in 2008 and 2013, respectively. He is currently a full professor with School of Information and Communication Engineering, DUT, China.
His research interests focuses on object detection and tracking.
Rights and permissions
About this article
Cite this article
Liu, C., Chen, XF., Bo, CJ. et al. Long-term Visual Tracking: Review and Experimental Comparison. Mach. Intell. Res. 19, 512–530 (2022). https://doi.org/10.1007/s11633-022-1344-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-022-1344-1