Abstract
Object tracking has achieved impressive performance in computer vision. However, there are many challenges due to complex scenarios in reality. The mainstream trackers mostly locate the object in form of two branches, which limits the ability of trackers to fully mine similarity between template and search region. In this paper, we propose a multi-branch and multi-scale perception object tracking framework based on Siamese Convolutional Neural Networks (MultiBSP), in which the multi-branch tracking framework is established based on the idea of relation mining, and a tower-structured relation network is designed for each branch to learn the non-linear relation function between template and search region. By branch combination, multiple branches can verify their predictions with each other, which is beneficial to robust tracking. Besides, in order to sense the scale and aspect ratio of object in advance, a multi-scale perception module is designed by utilizing the dilated convolutions in five scales, which contributes to the ability of tracker to deal with scale variation. In addition, we propose an information enhancement module that focuses on important features and suppresses unnecessary ones along spatial and channel dimensions. Extensive experiments on six visual tracking benchmarks including OTB100, VOT2018, VOT2019, UAV123, GOT-10k, and LaSOT demonstrate that our MultiBSP can achieve robust tracking and have state-of-the-art performance. Finally, ablation experiments verify the effectiveness of each module and the tracking stability is proved by qualitative and quantitative analyses.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Fu C, He Y, Lin F et al (2020) Robust multi-kernelized correlators for UAV tracking with adaptive context analysis and dynamic weighted filters. Neural Comput Appl 32:12591–12607
Li P, Qin T, Shen S (2018) Stereo vision-based semantic 3D Object and Ego-motion tracking for autonomous driving. In: Proceedings of the European conference on computer vision, pp. 664–679
Wang Z, Yoon S, Park DS (2017) Online adaptive multiple pedestrian tracking in monocular surveillance video. Neural Comput Appl 28:127–141
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971–8980
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4277–4286
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: Proceedings of the European conference on computer vision, pp. 771–787
Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6268–6276
Yang K, He Z, Pei W et al (2021) SiamCorners: siamese corner networks for visual tracking. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2021.3074239
Zhang Z, Liu Y, Wang X, Li B, Hu W (2021) Learn to match: automatic matching network design for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 13339–13348
Kristan M et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision workshops, pp. 3–53
Kristan M et al (2019) The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE international conference on computer vision workshop, pp. 2206–2241
Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: Proceedings of the European conference on computer vision, pp. 445–461
Huang L, Zhao X, Huang K (2021) GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577
Fan H, Lin L, Yang F et al (2019) LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5369–5378
Marvasti-Zadeh SM, Ghanei-Yakhdan H, Kasaei S (2021) Efficient scale estimation methods using lightweight deep convolutional neural networks for visual tracking. Neural Comput Appl 33:8319–8334
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4655–4664
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 6181–6190
Tao R, Gavves E, Smeulders AWM (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional siamese networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 850–865
Guo Q, Feng W, Zhou C, Huang R, Wan L, Wang S (2017) Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 1781–1789
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision, pp. 103–119
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 770–778
Wang Q, Zhang L, Bertinetto L, Hu W, Torr PH (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE international conference on computer vision, pp. 1328–1338
Dong X, Shen J, Shao L, Porikli F (2020) CLNet: A compact latent network for fast adjusting siamese trackers. In: Proceedings of the European conference on computer vision, pp. 378–395
Tian Z, Shen C, Chen H, He T (2019) FCOS: Fully convolutional onestage object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9626–9635
Law H, Deng J (2018) CornerNet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp. 765–781
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) RepPoints: Point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 9656–9665
Chen Z, Zhong B, Li G, Zhang S, Ji R (2020) Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6667–6676
Xu Y, Wang Z, Li Z, Ye Y, Yu G (2020) SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12549–12556
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1199–1208
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision, pp. 3–19
Zhang Z, Peng H (2019) Deeper and wider siamese networks for realtime visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4586–4595
Yang T, Chan AB (2018) Learning dynamic memory networks for object tracking. In: Proceedings of the European conference on computer vision, pp. 153–169
Li F, Tian C, Zuo W, Zhang L, Yang MH (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4904–4913
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision, pp. 740–755
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7464–7473
Song Y, Ma C, Wu X et al (2018) VITAL: Visual tracking via adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8990–8999
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6931–6939
Danelljan M, Hager G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp. 4310–4318
Zhang J, Ma S, Sclaroff S (2014) MEEM: Robust tracking via multiple experts using entropy minimization. In: Proceedings of the European conference on computer vision, pp. 188–203
Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575
Zhang Y, Wang L, Qi J, Wang D, Feng M, Lu H (2018) Structured siamese network for real-time visual tracking. In: Proceedings of the European conference on computer vision, pp. 355–370
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant 61671002. The experiments in this paper are conducted on the High Performance Computing Platform of Beihang University and the Supercomputing Platform of School of Mathematical Sciences.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests in relation to the work in this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, J., Yang, X., Li, Z. et al. MultiBSP: multi-branch and multi-scale perception object tracking framework based on siamese CNN. Neural Comput & Applic 34, 18787–18803 (2022). https://doi.org/10.1007/s00521-022-07420-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07420-0