Abstract
High quality crowd density maps preserve a large amount of spatial information of crowd distribution, which provides significant priori information for the field of crowd behavior analysis and anomaly detection. Recent work on crowd density estimation pays more attention to the accuracy of crowd counting, ignoring the quality of crowd density map estimation. Hence, in this paper, we propose an end-to-end crowd density estimation network to generate high quality crowd density map. The original pixel-level Euclidean distance loss function in the Multi-column Convolutional Neural Network (MCNN) is replaced by the perceptual loss network. By optimizing the perceptual loss function that is defined as the differences between high-level semantic features generated by a pre-trained network, high-quality map estimation can be obtained. At the same time the accuracy of crowd counting and the sensitivity to the external environment can be improved. Extensive experiments conducted on challenging datasets validate the proposed method outperforms the state-of-the-art methods in both the crowd counting accuracy and the density estimation quality.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen K, Loy CC, Gong S et al (2012) Feature mining for localised crowd counting. In: British Machine Vision Conference
Oosterhout TV, Bakkes S, Kröse BJ (2015) Head detection in stereo data for people counting and segmentation. In: International Conference on Computer Vision Theory and Applications, pp. 620-625
Wang S, Zhang J, Miao Z (2014) A new edge feature for head-shoulder detection. In: IEEE International Conference on Image Processing, pp. 2822-2826
Ouyang WL, Wang X (2014) Joint deep learning for pedestrian detection. In: IEEE International Conference on Computer Vision, pp. 2056-2063
Rabaud V, Belongie S (2006) Counting crowded moving objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 705-711
Brostow GJ, Cipolla R (2006) Unsupervised Bayesian detection of independent motion in crowds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 594-601
Zhang Z (2004) Camera calibration with one-dimensional objects. IEEE Trans Pattern Anal Mach Intel 26(7):892–899
Fradi H, Dugelay J (2016) Low level crowd analysis using frame-wise normalized feature for people counting. In: International Workshop on Information Forensics and Security, pp. 246–251
Liang R, Zhu Y, Wang H (2014) Counting crowd flow based on feature points. Neurocomputing. 133(8):377–384
Chan AB, Liang ZS, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-7
Fiaschi L, Koethe U, Nair R et al (2012) Learning to count with regression forest and structured labels. In: International Conference on Pattern Recognition, pp. 2685-2688
Mamoona S, Salman M, Hasan S et al (2018) People counting in dense crowd images using sparse head detections. IEEE Trans Circuits Syst Video Technol 8215:1–10
Zhang NC, Li NH, Wang X et al (2015) Cross-scene crowd counting via deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 833-841
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 589-597
Boominathan L, Kruthiventi SSS, Babu RV (2016) Crowdnet: a deep convolutional network for dense crowd counting. In: ACM International Conference on Multimedia, pp. 640-644
Xu M, Ge Z, Jiang X et al (2018) Depth information guided crowd counting for complex crowd scenes. Pattern Recognition Letters. 1-9
Marsden M, McGuiness K, Little S et al (2017) Fully convolutional crowd counting on highly congested scenes. In: International Conference on Computer Vision Theory and Applications, pp. 27-33
Shi Z, Zhang L, Sun Y et al (2018) Multiscale multitask deep NetVLAD for crowd counting. IEEE Trans Ind Inf 14(11):4953–4962
Hinton GE, Osindero S, Teh YW (2014) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Johnson J, Alahi A Li FF (2016) Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694-711
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science, 1-14
Wang Z, Bovik AC, Sheikh HR et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Li J, Yang H, Chen L et al (2017) An end-to-end generative adversarial network for crowd counting under complicated scenes. In: IEEE International Symposium on Broadband Multimedia Systems & Broadcasting, pp. 1-4
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid CNNs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1861-1870
Shi Z, Zhang L, Liu Y et al (2018) Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5382-5390
Idrees H, Saleemi I, Seibert C et al (2013) Multi-source multi-scale counting in extremely dense crowd images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554
Rodriguez M, Laptev I, Sivic J et al (2011) Density-aware person detection and tracking in crowds. In: International Conference on Computer Vision, pp. 2423–2430
Sam D B, Surya S, Babu R V (2017) Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4031-4039
Funding
This study was funded by National Natural Science Foundation of China (grant number: 61701029) and Basic Research Foundation of Beijing Institute of Technology (grant number: 20170542008).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, Z., Zhu, Y., Song, Y. et al. Generating high quality crowd density map based on perceptual loss. Appl Intell 50, 1073–1085 (2020). https://doi.org/10.1007/s10489-019-01573-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01573-7