Online Siamese Network for Visual Object Tracking
<p>The framework of Siamese Network in OSNV. <b>Left</b>: <b>Input Data</b> for updating <b>Siamese network</b>, which are the outputs of layer conv3 in VGG-M. Our proposed <b>Siamese network</b> lies in the middle, and two groups of branches share the same configuration and weights. <b>Right</b>: the improved contrastive loss, which is served as the <b>Loss Function</b> to propagate gradients for updating the <b>Siamese network</b>. Dpout stands for drop-out layer, best viewed in colour.</p> "> Figure 2
<p>The histograms of feature maps with different feature models. <b>Left</b>: Histogram of VGG-M; <b>Right</b>: Histogram of our Siamese network.</p> "> Figure 3
<p>The ablation study results of OSNV on OTB-2013 with extra four algorithms. The plot on the <b>left side</b> is the precision amplitude varied with location error threshold, and the legend is about precision scores. The plot on the <b>right side</b> is the success plots of OPE on OTB-2013, best viewed in colour.</p> "> Figure 4
<p>The OPE results of 10 tracking algorithms on OTB-2013. The <b>left side</b> is precision plots and the <b>right side</b> is success plots, which are both evaluated on OTB-2013.</p> "> Figure 5
<p>The robustness evaluation results of nine tracking algorithms on OTB-2013. The <b>left side</b> is the success plots of spatial robustness evaluation (SRE). In addition, the <b>right side</b> is the success plots of temporal robustness evaluation (TRE).</p> "> Figure 6
<p>The attribute-based evaluation results of TRE with nine tracking algorithms on OTB-2013.</p> "> Figure 7
<p>The OPE results of 10 tracking algorithms on OTB-2015. The <b>left side</b> is precision plots and the <b>right side</b> is success plots, which are both evaluated on OTB-2015.</p> "> Figure 8
<p>The OPE results of 10 tracking algorithms on OTB-50. The <b>left side</b> is precision plots and the <b>right side</b> is success plots, which are both evaluated on OTB-50.</p> "> Figure 9
<p>The OPE results of six tracking algorithms on TempleColor. The <b>left side</b> is precision plots and the <b>right side</b> is success plots, which are both evaluated on TempleColor.</p> "> Figure 10
<p>Qualitative performance of our proposed algorithm (OSNV), SINT_noflow, SiamFC_3s on eight challenging video sequences (from top to bottom rows are BlurCar, BlurFace, BlurBody, KiteSurf, Soccer, Bolt2, Human3a and Liquor.)</p> "> Figure 11
<p>Failure cases of OSNV algorithm about three video sequences: Diving, Ironman, Jump, from top to bottom.</p> ">
Abstract
:1. Introduction
- An online Siamese network is proposed. It can learn from the domain knowledge of target and adapt to appearance changes of target;
- An improved contrastive loss integrated with cross-entropy loss is introduced to update the Siamese network;
- The Bayesian verification model is transferred for candidate selection. In addition, we find that the visual object tracking can benefit from face verification algorithms;
2. Related Works
2.1. Siamese Network for Visual Object Tracking
2.2. Online Algorithms for Visual Object Tracking
2.3. Loss Function for CNNs in Visual Tracking
2.4. Bayesian Verification Model
3. Proposed Algorithm
3.1. Siamese Network
3.2. Loss Function
3.2.1. Cross-Entropy Loss
3.2.2. Contrastive Loss
3.2.3. Improved Contrastive Loss
3.3. Implementation of the Bayesian Verification Model
4. Implementation Details
5. Experimental Validations
5.1. Ablation Study
5.2. Evaluation on OTB-2013
5.3. Evaluation on OTB-2015
5.4. Evaluation on OTB-50
5.5. Evaluation on VOT-2016
5.6. Evaluation on TempleColor
5.7. Qualitative Evaluation
5.8. Failure Case
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Benfold, B.; Reid, I. Stable Multi-Target Tracking in Real-Time Surveillance Video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, 20–25 June 2011; pp. 3457–3464. [Google Scholar]
- Chen, P.; Dang, Y.; Liang, R.; Zhu, W.; He, X. Real-time object tracking on a drone with multi-inertial sensing data. IEEE Trans. Intell. Transp. Syst. 2018, 10, 131–139. [Google Scholar] [CrossRef]
- Rautaray, S.S.; Agrawal, A. Vision based hand gesture recognition for human computer interaction: A survey. Artif. Intell. Rev. 2015, 43, 1–54. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision Workshops (ECCV 2016), Amsterdam, The Netherlands, 8–10 October 2016; pp. 850–865. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Hierarchical convolutional features for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015; pp. 3074–3082. [Google Scholar]
- Tao, R.; Gavves, E.; Smeulders, A.W.M. Siamese instance search for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 1420–1429. [Google Scholar]
- Nam, H.; Han, B. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 4293–4302. [Google Scholar]
- Fan, H.; Ling, H. SANet: Structure-Aware Network for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 2217–2224. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 6931–6939. [Google Scholar]
- Zhang, T.Z.; Xu, C.S.; Yang, M.H. Multi-task Correlation Particle Filter for Robust Object Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; pp. 4819–4827. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin, L.; Vojír, T.; Häger, G.; Lukežič, A.; Fernández, G.; et al. The Visual Object Tracking VOT2016 Challenge Results. In Proceedings of the European Conference on Computer Vision Workshps (ECCV 2016), Amsterdam, The Netherlands, 8–16 October 2016; pp. 777–823. [Google Scholar]
- Chatfield, K.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In Proceedings of the British Machine Vision Conference (BMVC 2014), Nottingham, UK, 1–5 September 2014; pp. 1–12. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006), New York, NY, USA, 17–22 June 2006; pp. 1735–1742. [Google Scholar]
- Wang, N.Y.; Yeung, D.Y. Learning a deep compact image representation for visual tracking. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2013), Lake Tahoe, NV, USA, 5–8 December 2013; pp. 809–817. [Google Scholar]
- De Boer, P.; Kroese, D.; Mannor, S.; Rubinstein, R. A tutorial on the cross-entropy method. Ann. Oper. Res. 2005, 134, 19–67. [Google Scholar] [CrossRef]
- Chen, D.; Cao, X.; Wang, L.; Wen, F.; Sun, J. Bayesian face revisited: A joint formulation. In Proceedings of the European Conference on Computer Vision (ECCV 2012), Florence, Italy, 7–13 October 2012; pp. 566–579. [Google Scholar]
- Sun, Y.; Chen, Y.; Wang, X.; Tang, X. Deep learning face representation by joint identification-verification. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 1988–1996. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
- Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning Dynamic Siamese Network for Visual Object Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 1781–1789. [Google Scholar]
- Wang, Q.; Gao, J.; Xing, J.; Zhang, M.; Hu, W. DCFNet: Discriminant Correlation Filters Network for Visual Tracking. arXiv, 2017; arXiv:1704.04057. [Google Scholar]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning Spatially Regularized Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
- Huang, G.B.; Ramesh, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: Updates and New Reporting Procedures; UM-CS-2014-003; Technical Report; University of Massachusetts Amherst: Amherst, MA, USA, 2014. [Google Scholar]
- Wu, Y.; Lim, J.; Yang, M.H. Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv, 2012; arXiv:1207.0580. [Google Scholar]
- Liang, P.; Blasch, E.; Ling, H. Encoding Color Information for Visual Tracking: Algorithms and Benchmark. IEEE Trans. Image Process. 2015, 24, 5630–5644. [Google Scholar] [CrossRef] [PubMed]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, J.; Ma, S.; Sclaroff, S. MEEM: Robust Tracking via Multiple Experts using Entropy Minimization. In Proceedings of the European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 8–11 September 2014; pp. 188–203. [Google Scholar]
- Gao, J.; Ling, H.; Hu, W.; Xing, J. Transfer Learning Based Visual Tracking with Gaussian Processes Regression. In Proceedings of the European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 8–11 September 2014; pp. 188–203. [Google Scholar]
- Danelljan, M.; Robinson, A.; Khan, F.; Felsberg, M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. In Proceedings of the European Conference on Computer Vision (ECCV 2016), Amsterdam, The Netherlands, 8–10 October 2016; pp. 472–488. [Google Scholar]
- Danelljan, M.; Häger, G.; Khan, F.S.; Felsberg, M. Convolutional Features for Correlation Filter Based Visual Tracking. In Proceedings of the IEEE Conference on International Conference on Computer Vision Workshops (ICCV 2015), Santiago, Chile, 7–13 December 2015; pp. 621–629. [Google Scholar]
- Ma, C.; Huang, J.B.; Yang, X.; Yang, M.H. Robust Visual Tracking via Hierarchical Convolutional Features. IEEE Trans. Pattern Anal. Mach. Intell. 2018. [Google Scholar] [CrossRef] [PubMed]
CCOT | SiamFC_3s | DeepSRDCF | SRDCF | MDNet | TGPR | HCF | OSNV | |
---|---|---|---|---|---|---|---|---|
Overlap | 0.5332 | 0.5081 | 0.5231 | 0.5285 | 0.5366 | 0.4517 | 0.4372 | 0.5345 |
Failures | 16.5817 | 32.3730 | 20.3462 | 28.3167 | 21.0817 | 41.0121 | 23.8569 | 17.5017 |
EAO | 0.3310 | 0.2300 | 0.2763 | 0.2471 | 0.2572 | 0.1811 | 0.2203 | 0.3309 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chang, S.; Li, W.; Zhang, Y.; Feng, Z. Online Siamese Network for Visual Object Tracking. Sensors 2019, 19, 1858. https://doi.org/10.3390/s19081858
Chang S, Li W, Zhang Y, Feng Z. Online Siamese Network for Visual Object Tracking. Sensors. 2019; 19(8):1858. https://doi.org/10.3390/s19081858
Chicago/Turabian StyleChang, Shuo, Wei Li, Yifan Zhang, and Zhiyong Feng. 2019. "Online Siamese Network for Visual Object Tracking" Sensors 19, no. 8: 1858. https://doi.org/10.3390/s19081858
APA StyleChang, S., Li, W., Zhang, Y., & Feng, Z. (2019). Online Siamese Network for Visual Object Tracking. Sensors, 19(8), 1858. https://doi.org/10.3390/s19081858