Abstract
Visual Object Tracking is a very challenging task because of the large appearance variance caused by illumination, deformation, and motion. Siamese network-based trackers, which select target through a matching function, are widely used for visual object tracking. The trackers are capable of robustly recognizing the target with appearance variance. However, while the filter template is a crucial part of such methods, most of them did not update the filter template effectively, and have shown limited discriminative ability between target and similar semantic objects (distractors). In order to tackle the challenge of distractors, we added a dynamic filter branch on the traditional siamese network. Under the condition that multipeaks are detected on the static response map, the tracker will redetect target with dynamic branch and the final target location will be determined by the combined result of the dynamic filter branch and static filter branch. Subsequently the sample library with hard negative mining strategy is updated and the dynamic filter kernel is restrained online. With the fusion of two branches, the tracker can distinguish the true target from similar objects. Meanwhile, we conduct extensive experiments and empirical evaluations on two popular datasets: Visdrone and UAV123. Our tracker achieves an AUC of 58% on Visdrone dataset and an AUC of 60.7% on UAV123 dataset.
Similar content being viewed by others
References
Bertinetto, L., Valmadre, J., Henriques, J.F., et al.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (2016)
Li, B., Yan, J., Wu, W., et al.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Zhu, Z., Wang, Q., Li, B., et al.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)
Li, B., Wu, W., Wang, Q., et al.: Siamrpn++: Evolution of siamese visual tracking with very deep networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4282–4291 (2019)
Wang, Q., Zhang, L., Bertinetto, L., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Wu, Y. , Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, New York (2013)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
The visual object tracking VOT2013 challenge results. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW). IEEE Computer Society, New York (2013)
Kristan, M., Pflugfelder, R., Leonardis, A., et al. The visual object tracking VOT2014 challenge results. In: Computer Vision—ECCV 2016 Workshops, PT II, vol. 8926, pp. 191–217 (2014)
Kristan, M., Matas, J., Leonardis, Aleš, et al.: The visual object tracking VOT2015 challenge results. In: ICCV 2015. IEEE, New York (2015)
Leonardis, A.: The visual object tracking VOT2016 challenge results. In: IEEE International Conference on Computer Vision Workshops (2016)
Kristan, M., Leonardis, A., Matas, J., et al.: The visual object tracking VOT2017 challenge results. In: 2017 IEEE International Conference on Computer Vision Workshop (ICCVW). IEEE, New York (2017)
Kristan, M.: The visual object tracking VOT2018 challenge results. In: Computer Vision—ECCV 2018. Workshops, pp. 3–53 (2018)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: European Conference on Computer Vision. Springer, Cham (2016)
Fan, H., Lin, L., Yang, F., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking (2018)
Zhu, P., Wen, L., Bian, X., et al.: Vision meets drones: A challenge[J]. arXiv preprint arXiv:1804.07437 (2018)
Henriques, J.F., Caseiro, R., Martins, P., et al.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Eco: efficient convolution operators for tracking. In: CVPR (2017)
Danelljan, M., Robinson, A., Khan, F.S., et al.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision. Springer, New York (2016)
Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: European Conference on Computer Vision, pp. 254–265. Springer, Cham (2014)
Bertinetto, L., Valmadre, J., Golodetz, S., et al.: Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
Danelljan, M., Hager, G., Shahbaz Khan, F., et al.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Danelljan, M., Bhat, G., Khan, F.S., et al.: Atom: Accurate tracking by overlap maximization[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4660–4669 (2019)
Jiang, B., Luo, R., Mao, J., et al.: Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–799 (2018)
zuihou He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Li, X., Ma, C., Wu, B., et al.: Target-aware deep tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1369–1378 (2019)
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: CVPR (2010)
Danelljan, M., Häger, G., Khan, F., et al.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference, Nottingham, 1–5 September 2014. BMVA Press (2014)
Zhang, K., Zhang, L., Liu, Q., Zhang, D., Yang, M.-H.: Fast visual tracking via dense spatio-temporal context learning. In: ECCV (2014)
Bhat, G., Danelljan, M., Van Gool, L., et al.: Learning discriminative model prediction for tracking. arXiv preprint arXiv:1904.07220 (2019)
Zhang, Lichao, Gonzalez-Garcia, Abel, et al.: Learning the model update for siamese trackers. arXiv preprint arXiv:1908.00855 (2019)
Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4021–4029 (2017)
Gan, Q., Guo, Q., Zhang, Z., et al.: First step toward model-free, anonymous object tracking with recurrent neural networks. Comput. Sci. (2015)
Kahou, S.E., Michalski, V., Memisevic, R.: Ratm: recurrent attentive tracking model. In: CVPRw (2017)
Kosiorek, A.R., Bewley, A., Posner, I.: Hierarchical attentive recurrent tracking. In: NIPS (2017)
Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking. In: ICCVw, pp. 2010–2019 (2017)
Gordon, D., Farhadi, A., Fox, D.: Re3: real-time recurrent regression networks for object tracking. arXiv preprint arXiv:1705.06368 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet Classification with Deep Convolutional Neural Networks. NIPS. Curran Associates Inc, Red Hook (2012)
Suzuki, S., Be, K.: Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30(1), 32–46 (1985)
Zhou, J., Wang, P., Sun, H.: Discriminative and robust online learning for siamese visual tracking. arXiv preprint arXiv:1909.02959 (2019)
Zheng, Z., Yi, Y., Shen, J., et al.: Adaptive updating siamese network with like-hood estimation for surveillance video object tracking. In: 2019 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). IEEE, New York, pp. 126–131 (2019)
Perry, J.M.: A class of conjugate gradient algorithms with a two-step variable-metric memory. In: Discussion Papers, p. 269 (1977)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)
Wang, H., O’Brien, J.F., Ramamoorthi, R.: Data-driven elastic models for cloth: modeling and measurement. ACM Trans. Graph. (TOG) 30(4), 1–12 (2011)
Berahas, A.S., Nocedal, J., Takác, M.: A multi-batch L-BFGS method for machine learning. Adv Neural Inform Process Syst 1055–1063 (2016)
Bollapragada, R. et al.: A progressive batching L-BFGS method for machine learning. arXiv preprint arXiv:1802.05374 (2018)
Paszke, A., Gross, S., Massa, F., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inform. Process. Syst. 8024–8035 (2019)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740-755. Springer, New York (2014)
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-bounding boxes: a large high-precision human-annotated data set for object detection in video. arXiv preprint arXiv:1702.00824 (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by I. IDE.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Shen, H., Lin, D., Song, T. et al. Anti-distractors: two-branch siamese tracker with both static and dynamic filters for object tracking. Multimedia Systems 26, 631–641 (2020). https://doi.org/10.1007/s00530-020-00670-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-020-00670-9