Abstract
Accurately counting the number of dense objects, such as crowds or vehicles, in an image is a challenging and meaningful task widely used in public safety management and traffic flow prediction. The existing CNN-based density map estimation methods are ineffective for extracting the counting features of long-distance queuing vehicles in traffic jams; In addition, these methods do not focus on counting in complex scenes, such as vehicle counting in the human-vehicle mixed scenes. To tackle this issue, we propose MSCNet, a novel multi-scale dilated convolution channel-aware deep network for vehicle counting. The proposed network solves the problem of scale variation for long-distance queuing vehicles and improves the ability to extract vehicle features in human-vehicle mixed scenes. The MSCNet consists of a front-end module and three functional modules: the front-end module is used to extract the initial features of the counting image; the direction-based perspective coding module (DPCM) encodes the perspective information of the image from four directions to extract continuous long-distance features; the multi-scale dilated residual module (MDRM) can densely extract the large-scale variation features; the channel-aware attention module (CAM) effectively enhances the channel features that are important for vehicle counting in mixed human-vehicle scenes. The MSCNet has conducted extensive comparative experiments on the TRANCOS dataset, the VisDrone2021 Vehicle&Crowd dataset, and the ShanghaiTech dataset. The experimental results show that the MSCNet outperforms the state-of-the-art counting networks for dense vehicle counting, especially in mixed human-vehicle scenes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
All data created or used during this study are publicly available at the following websites: https://gram.web.uah.es/data/datasets/trancos/index.html, https://opendatalab.com/VisDrone, and https://github.com/desenzhou/ShanghaiTechDataset.
References
Min W, Liu R, He D et al (2022) Traffic Sign Recognition Based on Semantic Scene Understanding and Structural Traffic Sign Location. IEEE Trans Intell Transp Syst 23(9):15794–15807
Zhao H, Min W, Wei X et al (2021) MSR-FAN: Multi-Scale Residual Feature-Aware Network for Crowd Counting. IET Image Process 15(14):3512–3521
Fan Z, Zhang H, Zhang Z et al (2022) A Survey of Crowd Counting and Density Estimation Based on Convolutional Neural Network. Neurocomputing 472:224–251
Dirir A, Ignatious H, Elsayed H et al (2021) An Advanced Deep Learning Approach for Multi-Object Counting in Urban Vehicular Environments. Future Internet 13(12):306
Dai Z, Song H, Wang X et al (2019) Video-Based Vehicle Counting Framework. IEEE Access 7:64460–64470
Girshick R, Donahue J, Darrell T et al (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587
Liu Z, Zhang W, Gao X et al (2020) Robust Movement-Specific Vehicle Counting at Crowded Intersections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 614–615
Liang M, Huang X, Chen C et al (2015) Counting and Classification of Highway Vehicles by Regression Analysis. IEEE Trans Intell Transp Syst 16(5):2878–2888
Li Y, Zhang X, Chen D (2018) CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100
Antonini G, Thiran JP (2006) Counting Pedestrians in Video Sequences Using Trajectory Clustering. IEEE Trans Circuits Syst Video Technol 16(8):1008–1020
Lempitsky V, Zisserman A (2010) Learning to Count Objects in Images. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1324– 1332
Fu M, Xu P, Li X et al (2015) Fast Crowd Density Estimation with Convolutional Neural Networks. Eng Applic Artif Intell 43(auga):81–88
Zhang C, Li H, Wang X et al (2015) Cross-scene Crowd Counting via Deep Convolutional Neural Networks. In: Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–841
Zhang Y, Zhou D, Chen S et al (2016) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597
Liu L, Wang H, Li G et al (2018) Crowd Counting using Deep Recurrent Spatial-Aware Network. In: Proceedings of the 2018 International Joint Conference on Artificial Intelligence (IJCAI), pp. 849–855
Chen J, Su W, Wang Z (2020) Crowd Counting with Crowd Attention Convolutional Neural Network. Neurocomputing 382:210–220
Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception–v4, Inception-ResNet and the Impact of Residual Connections on Learning. In: Proceedings of the 2017 AAAI Conference on Artificial Intelligence, pp. 4278–4284
Fiaschi L, Kthe U, Nair R et al (2012) Learning to Count with Regression Forest and Structured Labels. In: Proceedings of the 2012 International Conference on Pattern Recognition (ICPR), pp. 2685–2688
PhamVQ, Kozakaya T, Yamaguchi O et al (2015) COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3253–3261
WangY, Zou Y (2016) Fast Visual Object Counting via Example-Based Density Estimation. In: Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), pp. 3653–3657
Ciregan D, Meier U, Schmidhuber J (2012) Multi-Column Deep Neural Networks for Image Classification. In: Proceedings of the 2012 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649
ZhouZ, Su L, Li G et al (2020) CSCNet: A Shallow Single Column Network for Crowd Counting. In: Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), pp. 535–538
JiangX, Xiao Z, Zhang B et al (2019) Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6133–6142
Simonyan K, Zisserman A (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXivpreprint, arXiv.1409.1556
Pan X, Shi J, Luo P et al (2018) Spatial as Deep: Spatial CNN for Traffic Scene Understanding. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence 32(1):7276–7283
He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778
Hu J, Shen L, Sun G et al (2018) Squeeze-and-Excitation Networks. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141
Siva P, Javad Shafiee M, Jamieson M (2016) Real-Time, Embedded Scene Invariant Crowd Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG). In: Proceedings of the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 67–74
Guerrero-Gmez-Olmedo R, Torre-Jimnez B, Lpez-Sastre R et al (2015) Extremely overlapping Vehicle Counting. In: Proceedings of the 2015 Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), pp. 423–431
Zhu P, Wen L, Bian X et al (2018) Vision meets drones:A challenge. arXivpreprint, arXiv:1804.07437
Onoro-Rubio D, Lpez-Sastre RJ (2016) Towards Perspective-Free Object Counting with Deep Learning. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV), pp. 615–629
Zhang S, Wu G, Costeira JP (2017) FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras. In: Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3667–3676
Gao J, Wang Q, Li X (2019) PCC Net: Perspective Crowd Counting via Spatial Convolutional Network. IEEE Trans Circuits Syst Video Technol 30(10):3486–3498
Dai F, Liu H, Ma Y et al (2021) Dense Scale Network for Crowd Counting. In: Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR), pp. 64–72
Sindagi VA, Patel VM (2017) Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. In: Proceedings of the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1861–1870
Ma Z, Wei X, Hong X et al (2019) Bayesian Loss for Crowd Count Estimation With Point Supervision. In: Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), pp. 6142–6151
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No.62076117) and Jiangxi Key Laboratory of Smart City, China (Grant No.20192BCD40002).
Funding
This work was supported by the National Natural Science Foundation of China (Grant No.62076117) and Jiangxi Key Laboratory of Smart City, China (Grant No.20192BCD40002).
Author information
Authors and Affiliations
Contributions
Qiyan Fu: Conceptualization, Methodology, Software, Validation, Investigation, Formal Analysis, Writing—Original Draft; Weidong Min (Corresponding Author): Conceptualization, Funding Acquisition, Resources, Supervision, Writing—Review & Editing; Chunbo Li: Data Curation, Visualization, Writing—Review & Editing; Haoyu Zhao: Resources, Investigation; Ye Cao: Writing—Review & Editing; Meng Zhu: Validation; All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, Q., Min, W., Li, C. et al. MSCNet: Dense vehicle counting method based on multi-scale dilated convolution channel-aware deep network. Geoinformatica 28, 245–269 (2024). https://doi.org/10.1007/s10707-023-00503-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-023-00503-7