Abstract
In this paper, we focus mainly on designing a Multi-Target Object Tracking algorithm that would produce high-quality trajectories while maintaining low computational costs. Using online association, such features enable this algorithm to be used in applications like autonomous driving and autonomous surveillance. We propose CNN-based, instead of hand-crafted, features to lead to higher accuracies. We also present a novel grouping method for 2-D online environments without prior knowledge of camera parameters and an affinity measure based on the groups maintained in previous frames. Comprehensive evaluations of our algorithm (CNNMTT) on a publicly available and widely used dataset (MOT16) reveal that the CNNMTT method achieves high quality tracking results in comparison to the state of the art while being faster and involving much less computational cost.
Similar content being viewed by others
Notes
Multi-Target Tracking
Bag-Of-Words
Single-Object Tracking
Please note that unlike the work of [28] where dynamic modeling was done in a 3D tracking environment, our model works in 2D environments and thus does not need previous knowledge about the camera.
Stochastic Gradient Descent
An observation on MOT16 dataset.
References
Andriyenko A, Schindler K, Roth S (2012) Discrete-continuous optimization for multi-target tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1926–1933). IEEE
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, 401–406
Breitenstein MD, Reichlin F, Leibe B, Koller-Meier E, Van Gool L (2011) Online multiperson tracking-by-detection from a single, uncalibrated camera. IEEE Trans Pattern Anal Mach Intell 33(9):1820–1833
Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3029–3037)
Choi W, Savarese S (2010) Multiple target tracking in world coordinate with single, minimally calibrated camera. In European Conference on Computer Vision (pp. 553–567). Springer Berlin Heidelberg
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML (pp. 647–655)
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Girshick R, Donahue J, Darrell T, Malik J (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587)
Helbing D, Molnar P (1995) Social force model for pedestrian dynamics. Phys Rev E 51(5):4282
Henriques JF, Caseiro R, Batista J (2011) Globally optimal solution to multi-object tracking with merged measurements. In 2011 International Conference on Computer Vision (pp. 2470–2477). IEEE
Henschel R, Leal-Taixé L, Rosenhahn B, Schindler K (2016) Tracking with multi-level features. arXiv preprint arXiv:1607.07304
Hu M, Ali S, Shah M (2008) Detecting global motion patterns in complex videos. In Pattern Recognition, 2008. ICPR 2008. 19th International Conference on (pp. 1–5). IEEE
Kalal Z, Mikolajczyk K, Matas J (2010). Forward-backward error: Automatic detection of tracking failures. In Pattern recognition (ICPR), 2010 20th international conference on (pp. 2756–2759). IEEE
Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Keuper M, Tang S, Zhongjie Y, Andres B, Brox T, Schiele B (2016) A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317
Kratz L, Nishino K (2010) Tracking with local spatio-temporal motion patterns in extremely crowded scenes. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 693–700). IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105)
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2169–2178). IEEE
Lee B, Erdenee E, Jin S, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. arXiv preprint arXiv:1608.08434
Lee B, Erdenee E, Jin S, Nam MY, Jung YG, Rhee PK (2016) Multi-class multi-object tracking using changing point detection. In European Conference on Computer Vision (pp. 68–83). Springer, Cham
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Milan A, Roth S, Schindler K (2014) Continuous energy minimization for multitarget tracking. IEEE Trans Pattern Anal Mach Intell 36(1):58–72
Milan A, Leal-Taixe L, Reid I, Roth S, Schindler K (2016) MOT16: A Benchmark for Multi-Object Tracking. arXiv preprint arXiv:1603.00831
Mitzel D, Leibe B (2011) Real-time multi-person tracking with detector assisted structure propagation. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 974–981). IEEE
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1201–1208). IEEE
Possegger H, Mauthner T, Roth PM, Bischof H (2014) Occlusion geodesics for online multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1306–1313)
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99)
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Online Multi-target Tracking with Strong and Weak Detections. In ECCV Workshops (2) (pp. 84–99)
Sanchez-Matilla R, Poiesi F, Cavallaro A (2016) Multi-target tracking with strong and weak detections. In ECCV Workshops-Benchmarking Multi-Target Tracking (Vol. 5, No. 6, p. 18)
Stiefelhagen R, Bernardin K, Bowers R, Garofolo J, Mostefa D, Soundararajan P (2006) The CLEAR 2006 evaluation. In International Evaluation Workshop on Classification of Events, Activities and Relationships(pp. 1–44). Springer Berlin Heidelberg
Sugimura D, Kitani KM, Okabe T, Sato Y, Sugimoto A (2009) Using individuality to track individuals: clustering individual trajectories in crowds using local appearance and frequency trait. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1467–1474). IEEE
Tang S, Andriluka M, Andres B, Schiele B (2017). Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3539–3548)
Tao D, Guo Y, Song M, Li Y, Yu Z, Tang YY (2016) Person re-identification by dual-regularized kiss metric learning. IEEE Trans Image Process 25(6):2726–2738
Wang X, Yang M, Zhu S, Lin Y (2013) Regionlets for generic object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 17–24)
Wu B, Nevatia R (2006). Tracking of multiple, partially occluded humans based on static body part detection. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 1, pp. 951–958). IEEE
Yang B, Nevatia R (2012) Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (pp. 1918–1925). IEEE
Yang M, Yu T, Wu Y (2007) Game-theoretic multiple target tracking. In 2007 IEEE 11th International Conference on Computer Vision(pp. 1–8). IEEE
Yang H, Shao L, Zheng F, Wang L, Song Z (2011) Recent advances and trends in visual tracking: a review. Neurocomputing 74(18):3823–3831
Yu F, Li W, Li Q, Liu Y, Shi X, Yan J (2016) POI: multiple object tracking with high performance detection and appearance feature. In European Conference on Computer Vision (pp. 36–42). Springer, Cham
Zeiler MD, Fergus R (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833). Springer International Publishing
Zhao X, Gong D, Medioni G (2012) Tracking using motion patterns for very crowded scenes. In Computer Vision–ECCV 2012 (pp. 315–328). Springer Berlin Heidelberg
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahmoudi, N., Ahadi, S.M. & Rahmati, M. Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78, 7077–7096 (2019). https://doi.org/10.1007/s11042-018-6467-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6467-6