Abstract
In this paper, we propose a method for fine-grained vehicle recognition in traffic surveillance video. Compared with general theory about single image fine-grained recognition, this method focuses on multi-frame information combination and the viewpoint changes across videos. Firstly, we detect vehicle instances and their local frames in input traffic video by vehicle tracking. For each vehicle instance, pose estimation is used to extract the 3D orientation in corresponding frame. We encode the 3D orientation as an extra supervising clue, and merge it with CNN feature to show the appearance information and changes in moving process. In addition, recurrent neural network (RNN) is proposed to select abundant information over traffic video and fuse CNN feature of each vehicle frames into comprehensive feature which includes not only spatial information but also temporal information for fine-grained recognition. We do our experiments on the personal CarVideo dataset which collected by surveillance cameras and the open dataset BoxCar116k for performance evaluation. The experiments show that our method outperforms the state-of-the-art methods for fine-grained recognition in traffic video application.
Similar content being viewed by others
References
Bae S-H, Yoon K-J (2014) Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning. In 2014 IEEE conference on computer vision and pattern recognition, Jun. 2014. : https://doi.org/10.1109/CVPR.2014.159
Biglari M, Soleimani A, Hassanpour H (2017) A cascaded part-based system for fine-grained vehicle classification. IEEE Trans Intell Transp Syst 19, 1 (2018), 273–283. : https://doi.org/10.1109/TITS.2017.2749961
Chen Y, Bai Y, Zhang W, Mei T (2019) Destruction and Construction Learning for Fine-grained Image Recognition. In 2019 IEEE/CVF conference on computer vision and pattern recognition, Jun. 2019. : https://doi.org/10.1109/CVPR.2019.00530
Chen Q, Liu W, Yu X (2020) A viewpoint aware multi-task learning framework for fine-grained vehicle recognition. IEEE Access 8(2020):171912–171923. https://doi.org/10.1109/ACCESS.2020.3024658
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078, Jun. 2014. https://arxiv.org/abs/1406.1078
Duan K, Parikh D, Crandall D, Grauman K (2012) Discovering localized attributes for fine-grained recognition. In 2012 IEEE conference on computer vision and pattern recognition, 2012. https://doi.org/10.1109/CVPR.2012.6248089
Fang J, Yu Z, Yu Y, Du S (2016) Fine-grained vehicle model recognition using a coarse-to-fine convolutional neural network architecture. IEEE Trans Intell Transp Syst 18, 7 (2017), 1782–1792. : https://doi.org/10.1109/TITS.2016.2620495
Ge W, Lin X, Yu Y (2019) Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up. In 2019 IEEE/CVF conference on computer vision and pattern recognition, Jun 2019. : https://doi.org/10.1109/CVPR.2019.00315
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition, Jun. 2016. : https://doi.org/10.1109/CVPR.2016.90
Hochreiter S, Schmidhuber J (1997) Long Short-Term Memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu H-N, Cai Q-Z, Wang D, Lin J, Sun M, Kraehenbuehl P, Darrell T, Yu F (2019) Joint Monocular 3D Vehicle Detection and Tracking. In 2019 IEEE/CVF international conference on computer vision, Oct. 2019. : https://doi.org/10.1109/ICCV.2019.00549
Huang S, Xu Z, Tao D, Zhang Y (2016) Part-Stacked CNN for Fine-Grained Visual Categorization. In 2016 IEEE conference on computer vision and pattern recognition, Jun. 2016. : https://doi.org/10.1109/CVPR.2016.132
Jianlong F, Zheng H, Mei T (2017) Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition. In 2017 IEEE conference on computer vision and pattern recognition, Jul. 2017. : https://doi.org/10.1109/CVPR.2017.476
Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In 2015 IEEE conference on computer vision and pattern recognition, Jun. 2015. : https://doi.org/10.1109/CVPR.2015.7299194
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105 http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Kumaran SK, Mohapatra S, Dogra DP, Roy PP, Kim B-G (2019) Computer vision-guided intelligent traffic signaling for isolated intersections. Expert Syst Appl 134(2019):267–278. https://doi.org/10.1016/j.eswa.2019.05.049
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Li X, Yu L, Chang D, Ma Z, Cao J (2019) Dual cross-entropy loss for small-sample fine-grained vehicle classification. IEEE Trans Veh Technol 68(5):4204–4212. https://doi.org/10.1109/TVT.2019.2895651
Liang L, Hu R, Xiao J, Wang Q, Xiao J, Chen J (2015) Exploiting effects of parts in fine-grained categorization of vehicles. In 2015 IEEE international conference on image processing, Sept. 2015. : https://doi.org/10.1109/ICIP.2015.7350898
Lin D, Shen X, Lu C, Jia J (2015) Deep LAC: Deep Localization, Alignment and Classification for Fine-grained Recognition. In 2015 IEEE conference on computer vision and pattern recognition, Jun. 2015. : https://doi.org/10.1109/CVPR.2015.7298775
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In 2015 IEEE international conference on computer vision, Dec. 2015. : https://doi.org/10.1109/ICCV.2015.170
Lin T-Y, RoyChowdhury A, Maji S (2017) Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans Pattern Anal Mach Intell 40, 6 (2018), 1309–1322. : https://doi.org/10.1109/TPAMI.2017.2723400
Milan A, Roth S, Schindler K (2013) Continuous energy minimization for multitarget tracking. IEEE Trans Pattern Anal Mach Intell 36, 1 (2014), 58–72. : https://doi.org/10.1109/TPAMI.2013.103
Rachmadi RF, Uchimura K, Koutaki G, Ogata K (2018) Hierarchical Spatial Pyramid Pooling for Fine-Grained Vehicle Classification. In 2018 International workshop on big data and information security, May. 2018. : https://doi.org/10.1109/IWBIS.2018.8471695
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39, 6 (2017), 1137–1149. : https://doi.org/10.1109/TPAMI.2016.2577031
Simon M, Rodner E (2015) Neural activation constellations: Unsupervised part model discovery with convolutional networks. In 2015 IEEE international conference on computer vision, Dec. 2015. : https://doi.org/10.1109/ICCV.2015.136
Simonyan K, Zisserman A (2014) Very deep convolutional net-works for large-scale image recognition. CoRR abs/1409.1556. arXiv:1409.1556, Sep. 2014. https://arxiv.org/abs/1409.1556
Sochor J, Špaňhel J, Herout A (2018) BoxCars: improving fine-grained recognition of vehicles using 3-D bounding boxes in traffic surveillance. IEEE Trans Intell Transp Syst 20, 1 (2019), 97–108. : https://doi.org/10.1109/TITS.2018.2799228
Tabernik D, Skočaj D (2020) Deep learning for large-scale traffic-sign detection and recognition. IEEE Trans Intell Transp Syst 21(4):1427–1440. https://doi.org/10.1109/TITS.2019.2913588
Tang S, Andres B, Andriluka M, Schiele B (2015) Subgraph Decomposition for Multi-Target Tracking. In 2015 IEEE conference on computer vision and pattern recognition, Jun 2015. : https://doi.org/10.1109/CVPR.2015.7299138
Xu Z, Tao D, Huang S, Zhang Y (2016) Friend or foe: fine-grained categorization with weak supervision. IEEE Trans Image Process 26(1):135–146. https://doi.org/10.1109/TIP.2016.2621661
Yao B, Bradski G, Fei-Fei L (2012) A codebook-free and annotation-free approach for fine-grained image categorization. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun 2012. : https://doi.org/10.1109/CVPR.2012.6248088
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434. https://doi.org/10.1109/TIP.2019.2896952
Zhan J, Zhang H, Luo X (2014) Fine-grained Vehicle Recognition via Detection-Classification-Tracking in Surveillance Video. In 2014 5th international conference on digital home, Nov. 2014. : https://doi.org/10.1109/ICDH.2014.10
Zhang Q, Zhuo L, Hu X, Zhang J (2016) Fine-grained Vehicle Recognition Using Hierarchical Fine-Tuning Strategy for Urban Surveillance Videos. In 2016 International conference on Progress in informatics and computing, Dec. 2016. : https://doi.org/10.1109/PIC.2016.7949501
Zhang Q, Zhuo L, Zhang S, Li J, Zhang H, Li X (2018) Fine-grained Vehicle Recognition Using Lightweight Convolutional Neural Network with Combined Learning Strategy. In 2018 IEEE fourth international conference on multimedia big data, Sept. 2018. : https://doi.org/10.1109/BigMM.2018.8499085
Zhang H, Liptrott M, Bessis N, Cheng J (2019) Real-time Traffic Analysis Using Deep Learning Techniques and UAV Based Video. In 2019 IEEE international conference on advanced video and signal based surveillance, Sep. 2019. : https://doi.org/10.1109/AVSS.2019.8909879
Zheng H, Jianlong F, Mei T, Luo J (2017) Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition. In 2017 IEEE international conference on computer vision, Oct. 2017. : https://doi.org/10.1109/ICCV.2017.557
Zhu Y, Liao M, Yang M, Liu W (2018) Cascaded segmentation-detection networks for text-based traffic sign detection. IEEE Trans Intell Transp Syst 19(1):209–219. https://doi.org/10.1109/TITS.2017.2768827
Acknowledgements
Supported by:The National Natural Science Foundation of China No. 42075139,42077232, 61272219; The National High Technology Research and Development Program of China No. 2007AA01Z334; The Science and technology program of Jiangsu Province No. BE2020082, BE2010072, BE2011058, BY2012190; The Program for New Century Excellent Talents in University of China No. NCET-04-04605; The China Postdoctoral Science Foundation No. 2017 M621700 and Innovation Fund of State Key Laboratory for Novel Software Technology No. ZZKT2021A17.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interests/Competing interests
There are no conflicts of interests/competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, A., Sun, Z., Li, Q. et al. Fine-grained traffic video vehicle recognition based orientation estimation and temporal information. Multimed Tools Appl 82, 13745–13763 (2023). https://doi.org/10.1007/s11042-022-13811-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13811-1