Robust Template Adjustment Siamese Network for Object Visual Tracking
<p>Visual comparison of TA-Siam with state-of-the-art trackers on four video sequences: soccer, kitesurf, skating1, motor-rolling. TA-Siam expresses our proposed Template Adjustment Siamese Network tracker. The number in the top left corner of each image represents the video frame number. Separately, these sequences represent four different challenges, respectively. Compared with SiamBAN [<a href="#B6-sensors-21-01466" class="html-bibr">6</a>] and SiamRPN++ [<a href="#B7-sensors-21-01466" class="html-bibr">7</a>], our proposed tracker can avoid bounding box drift and object loss phenomenon.</p> "> Figure 2
<p>The overview of our proposed TA-Siam framework. TAC means Template Adjustment Controller. The template adjustment module has the characteristics of plug-and-play.</p> "> Figure 3
<p>The rhombus classification labels and four sides regression of bounding box.</p> "> Figure 4
<p>The distance of central points and diagonal points.</p> "> Figure 5
<p>Expected Average Overlap (EAO) ranking of the evaluated tracker on VOT2016 benchmark.</p> "> Figure 6
<p>EAO ranking of the evaluated tracker on VOT2018 benchmark.</p> "> Figure 7
<p>Comparisons of EAO and speed on VOT2018 benchmark.</p> "> Figure 8
<p>The precision plots and success plots on OTB50 dataset.</p> "> Figure 9
<p>The evaluation on OTB100 dataset with four challenging attributes.</p> "> Figure 10
<p>Comparison results on the GOT-10k benchmark.</p> "> Figure 11
<p>Comparison with other trackers on LaSOT test set in terms of the precision, normalized precision, and success plots.</p> "> Figure 12
<p>Four different sample label combinations: Ellipses, rhombus-ellipse, ellipse-rhombus, and rhombuses.</p> ">
Abstract
:1. Introduction
- A plug-and-play template adjustment Siamese network is designed for visual tracking, which sharply reduces the risk of model drift and object loss;
- In classification and regression branches, the rhombus labels and anchor-free strategy are presented to accurately infer the center point and sides of the bounding box. In the training phase, the Distance-Intersection over Union (D-IOU) loss is realized to train the anchor-free regression subnetwork.
2. Related Work
2.1. Siamese Network
2.2. Template Updating
2.3. Anchor-Free Regression
3. Methods
3.1. Template Extraction and Adjustment
3.2. Classification Label Selection and Anchor-Free Regression
3.3. Loss Function with Distance Constraint
4. Experiments
4.1. Implementation Details
4.2. Comparison with State-of-the-Art
4.2.1. Results on VOT2016 and VOT2018
4.2.2. Results on OTB50 and OTB100
4.2.3. Results on GOT-10k
4.2.4. Results on LaSOT
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, L.; Gonzalez-Garcia, A.; van de Weijer, J.; Danelljan, M.; Khan, F.S. Learning the model update for siamese trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 4010–4019. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H. Fully-convolutional siamese networks for object tracking. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 850–865. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8971–8980. [Google Scholar]
- Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, Z.; Peng, H. Ocean: Object-aware anchor-free tracking. arXiv 2020, arXiv:2006.10721. [Google Scholar]
- Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6668–6677. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4282–4291. [Google Scholar]
- Hadfield, S.; Bowden, R.; Lebeda, K. The Visual Object Tracking VOT2016 Challenge Results. Lect. Notes Comput. Sci. 2016, 9914, 777–823. [Google Scholar]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin Zajc, L.; Vojir, T.; Bhat, G.; Lukezic, A.; Eldesokey, A. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.-H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Huang, L.; Zhao, X.; Huang, K. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 5374–5383. [Google Scholar]
- Tao, R.; Gavves, E.; Smeulders, A.W. Siamese instance search for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1420–1429. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4660–4669. [Google Scholar]
- Bhat, G.; Danelljan, M.; Gool, L.V.; Timofte, R. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 16–20 June 2019; pp. 6182–6191. [Google Scholar]
- Danelljan, M.; Gool, L.V.; Timofte, R. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7183–7192. [Google Scholar]
- Yao, Y.; Wu, X.; Zhang, L.; Shan, S.; Zuo, W. Joint representation and truncated inference learning for correlation filter based tracking. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 552–567. [Google Scholar]
- Choi, J.; Kwon, J.; Lee, K.M. Real-time visual tracking by deep reinforced decision making. Comput. Vis. Image Underst. 2018, 171, 10–19. [Google Scholar] [CrossRef] [Green Version]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic siamese network for visual object tracking. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1763–1771. [Google Scholar]
- Choi, J.; Chang, H.J.; Fischer, T.; Yun, S.; Lee, K.; Jeong, J.; Demiris, Y.; Choi, J.Y. Context-aware deep feature compression for high-speed visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 479–488. [Google Scholar]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 4310–4318. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
- Xu, Y.; Wang, Z.; Li, Z.; Yuan, Y.; Yu, G. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12549–12556. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6269–6277. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5296–5305. [Google Scholar]
- Yu, Y.; Xiong, Y.; Huang, W.; Scott, M.R. Deformable Siamese attention networks for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6728–6737. [Google Scholar]
- Hu, Q.; Zhou, L.; Wang, X.; Mao, Y.; Zhang, J.; Ye, Q. SPSTracker: Sub-Peak Suppression of Response Map for Robust Object Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10989–10996. [Google Scholar]
- Yang, T.; Xu, P.; Hu, R.; Chai, H.; Chan, A.B. ROAM: Recurrently optimizing tracking model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6718–6727. [Google Scholar]
- Wang, G.; Luo, C.; Xiong, Z.; Zeng, W. Spm-tracker: Series-parallel matching for real-time visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3643–3652. [Google Scholar]
- Wang, G.; Luo, C.; Sun, X.; Xiong, Z.; Zeng, W. Tracking by instance detection: A meta-learning approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6288–6297. [Google Scholar]
- Li, P.; Chen, B.; Ouyang, W.; Wang, D.; Yang, X.; Lu, H. Gradnet: Gradient-guided network for visual object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6162–6171. [Google Scholar]
- Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P.H. End-to-end representation learning for correlation filter based tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2805–2813. [Google Scholar]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Convolutional features for correlation filter based visual tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 11–18 December 2015; pp. 58–66. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P.H. Staple: Complementary learners for real-time tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1401–1409. [Google Scholar]
- Lukezic, A.; Matas, J.; Kristan, M. D3S-A discriminative single shot segmentation tracker. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 7133–7142. [Google Scholar]
- Sauer, A.; Aljalbout, E.; Haddadin, S. Tracking holistic object representations. arXiv 2019, arXiv:1907.12920. [Google Scholar]
- Zhang, Z.; Peng, H. Deeper and wider siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4591–4600. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. Globaltrack: A simple and strong baseline for long-term tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11037–11044. [Google Scholar]
SiamRPN | SPM | ROAM | SPS | SiamRPN++ | UpdateNet | SiamBAN | SiamAttn | Ours | |
---|---|---|---|---|---|---|---|---|---|
EAO↑ | 0.337 | 0.434 | 0.441 | 0.459 | 0.478 | 0.481 | 0.494 | 0.537 | 0.555 |
A↑ | 0.578 | 0.620 | 0.599 | 0.625 | 0.637 | 0.610 | 0.632 | 0.680 | 0.628 |
R↓ | 0.312 | 0.210 | 0.174 | 0.158 | 0.177 | 0.210 | 0.158 | 0.140 | 0.107 |
FCOS-MAML | ATOM | SiamRPN++ | SiamFC++ | SPS | PrDiMP | SiamBAN | Retina-MAML | Ours | |
---|---|---|---|---|---|---|---|---|---|
EAO↑ | 0.392 | 0.400 | 0.414 | 0.426 | 0.434 | 0.442 | 0.452 | 0.452 | 0.469 |
A↑ | 0.635 | 0.590 | 0.600 | 0.587 | 0.612 | 0.618 | 0.597 | 0.604 | 0.592 |
R↓ | 0.220 | 0.203 | 0.234 | 0.183 | 0.169 | 0.165 | 0.178 | 0.159 | 0.155 |
SiamDW | THOR | ROAM | SPM | SiamRPN++ | ATOM | SiamCAR | SiamFC++ | D3S | Ours | |
---|---|---|---|---|---|---|---|---|---|---|
AO | 0.416 | 0.447 | 0.465 | 0.513 | 0.517 | 0.556 | 0.569 | 0.595 | 0.597 | 0.608 |
SR0.5 | 0.475 | 0.538 | 0.532 | 0.593 | 0.616 | 0.634 | 0.670 | 0.695 | 0.676 | 0.731 |
SR0.75 | 0.144 | 0.204 | 0.236 | 0.359 | 0.325 | 0.402 | 0.415 | 0.479 | 0.462 | 0.455 |
#Num | Components | EAO↑ | R (Failure Rate)↓ |
---|---|---|---|
① | baseline | 0.494 | 0.158 |
② | +DIOU | 0.505 | 0.149 |
③ | +TAM | 0.538 | 0.121 |
④ | +TAM+DIOU | 0.555 | 0.107 |
#Num | Label Shapes | AO↑ |
---|---|---|
① | Ellipses | 0.575 |
② | Rhombus + Ellipse | 0.581 |
③ | Ellipse + Rhombus | 0.577 |
④ | Rhombuses | 0.608 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, C.; Qin, P.; Zhang, J. Robust Template Adjustment Siamese Network for Object Visual Tracking. Sensors 2021, 21, 1466. https://doi.org/10.3390/s21041466
Tang C, Qin P, Zhang J. Robust Template Adjustment Siamese Network for Object Visual Tracking. Sensors. 2021; 21(4):1466. https://doi.org/10.3390/s21041466
Chicago/Turabian StyleTang, Chuanming, Peng Qin, and Jianlin Zhang. 2021. "Robust Template Adjustment Siamese Network for Object Visual Tracking" Sensors 21, no. 4: 1466. https://doi.org/10.3390/s21041466
APA StyleTang, C., Qin, P., & Zhang, J. (2021). Robust Template Adjustment Siamese Network for Object Visual Tracking. Sensors, 21(4), 1466. https://doi.org/10.3390/s21041466