Abstract
Crowd counting has long been a challenging task due to the perspective distortion and variability in head size. The previous methods ignore the multi-scale information in images or simply use convolutions with different kernel sizes to extract multi-scale features, resulting in incomplete multi-scale features extracted. In this paper, we propose a crowd counting model called Multi-scale Dilated Convolution of Feature Fusion Network (MsDFNet) based on a CNN (convolutional neural network). Our MsDFNet is based on the regression method of the density map. The density map is predicted by the parameters learned by CNN to obtain better prediction results. The proposed network mainly includes three components, a CNN to extract low-level features, a multi-scale dilated convolution module and multi-column feature fusion blocks, a density map regression module. Multi-scale dilated convolutions are employed to extract multi-scale high-level features, and the features extracted from different columns are fused. The combination of the multi-scale dilated convolution module and the multi-column feature fusion block can effectively extract more complete multi-scale features and boost the performance of counting small-sized targets. Experiments show that the problem of various head sizes in images can be effectively solved by fusing multi-scale context feature information. We prove the effectiveness of our method on two public datasets (The ShanghaiTech dataset and the UCF_CC_50 dataset). We compare our method with the previous state-of-the-art crowd counting algorithms in terms of MAE (Mean Absolute Error) and MSE (Mean Square Error) and significantly improves the performance, especially in case of various head sizes. On the UCF_CC_50 dataset, our method reduces the MAE index by 28.6 compared with the previous state-of-the-art method. (The lower the MAE, the better the performance).
Similar content being viewed by others
References
Aich S, Stavness I (2019) Global Sum Pooling: A Generalization Trick for Object Counting with Small Datasets of Large Images. In: Proc. IEEE Conf. CVPR, 73–82
Babu Sam D, Sajjan NN, Venkatesh Babu R, Srinivasan M (2018) Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 3618–3626
Boominathan L, Kruthiventi SS, Babu RV (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM international conference on Multimedia, 640–644
Cai W, Wei Z (2020) Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci Remote Sens Lett
Cai W, Wei Z (2020) PiiGAN: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Cao X, Wang Z, Zhao Y, Su F (2018) Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 734–750
Chen J, He L, Yang T (2016) Scale-up purification for rutin hyrdrolysates by high-performance counter-current chromatography coupled with semi-preparative high-performance liquid chromatography. Sep Sci Technol 51(9):1523–1530
Chen J, Kumar A, Ranjan R, Patel VM, Alavi A, Chellappa R (2016) A cascaded convolutional neural network for age estimation of unconstrained faces. In: 2016 IEEE 8th International Conference on Biometrics Theory, Applications and Systems (BTAS), 1–8
Dai J, He K, Sun J (2016) Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150–3158
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 1, 886–893
Denman S, Chandran V, Sridharan S (2007) An adaptive optical flow technique for person tracking systems. Pattern Recognit Lett 28(10):1232–1239
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2547–2554
Li Y, Zhang X, Chen D (2018) Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1091–1100
Liu N, Long Y, Zou C et al (2019) ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding. In: Proc. IEEE Conf. CVPR, 3225–3234
Liu Y, Shi M, Zhao Q, Wang X (2019) Point in, box out: Beyond counting persons in crowds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6469–6478
Liu L, Jia W, Jiang J, Amirgholipour S, Wang Y, Zeibots M (2020) He X (2020) Denet: A universal network for counting crowd with varying densities and scales. IEEE Trans Multimed 23:1060–1068
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5099–5108
Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision, 2, 1150–1157
Ma Z, Wei X, Hong X, Gong Y (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision, 6142–6151
Paszke A, Gross S, Massa F et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in neural information processing systems, 8026–8037
Ranjan V, Le H, Hoai M (2018) Iterative crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), 270–285
Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features. In: 2009 Digital Image Computing: Techniques and Applications, 81–88
Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4031–4039
Sindagi VA, Patel VM (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6
Shen Z, Xu Y, Ni B, Wang M, Hu J, Yang X (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 5245–5254
Shi M, Yang Z, Xu C, Chen Q (2019) Revisiting perspective information for efficient crowd counting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7279–7288
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE International Conference on Computer Vision, 1861–1870
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154
Wang Y, Wang G, Chen C et al (2019) Multi-scale convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960
Wang Z, Xiao Z, Xie K, Qiu Q, Zhen X, Cao X (2018) In defense of single-column networks for crowd counting.arXiv preprint arXiv:1808.06133
Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266
Xiong F, Shi X, Yeung DY (2017) Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, 5151–5159
Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2018) RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295
You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. In: 2017 IEEE International Conference on Image Processing (ICIP), 465–469
Zhang Q, Chan AB (2019) Wide-area crowd counting via ground-plane density maps and multi-view fusion CNNs. In: Proc. IEEE Conf. CVPR, 8297–8306
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 833–841
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 589–597
Zhang L, Shi Z, Cheng M, Liu Y et al (2020) Nonlinear regression via deep negative correlation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–16
Acknowledgements
This work was supported by the Natural Science Foundation of Shandong Province (No. ZR2019MF050) and the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No. 2020KJN011).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, D., Wang, G. & Zhai, G. Multi-scale dilated convolution of feature Fusion Network for Crowd counting. Multimed Tools Appl 81, 37939–37952 (2022). https://doi.org/10.1007/s11042-022-13130-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13130-5