Abstract
Crowd counting has achieved significant progress with deep convolutional neural networks. However, most of the existing methods don’t fully utilize spatial context information, and it is difficult for them to count the congested crowd accurately. To this end, we propose a novel Adaptive Multi-scale Context Aggregation Network (MSCANet), in which a Multi-scale Context Aggregation module (MSCA) is designed to adaptively extract and aggregate the contextual information from different scales of the crowd. More specifically, for each input, we first extract multi-scale context features via atrous convolution layers. Then, the multi-scale context features are progressively aggregated via a channel attention to enrich the crowd representations in different scales. Finally, a \(1\times 1\) convolution layer is applied to regress the crowd density. We perform extensive experiments on three public datasets: ShanghaiTech Part_A, UCF_CC_50 and UCF-QNRF, and the experimental results demonstrate the superiority of our method compared to current the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: ECCV (2018)
Chen, X., Bin, Y., Sang, N., Gao, C.: Scale pyramid network for crowd counting. In: WACV (2019)
Deb, D., Ventura, J.: An aggregated multicolumn dilated convolution network for perspective-free counting. In: CVPR Workshop (2018)
Gao, J., Lin, W., Zhao, B., Wang, D., Gao, C., Wen, J.: C\(^3\) framework: an open-source pytorch code for crowd counting. arXiv preprint arXiv:1907.02724 (2019)
Gao, J., Wang, Q., Li, X.: PCC net: perspective crowd counting via spatial convolutional network. IEEE TCSVT 1 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: CVPR (2013)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: ECCV (2018)
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: CVPR (2019)
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: NeurIPS (2010)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: CVPR (2018)
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L., Wu, H.: ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. In: CVPR (2019)
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: CVPR (2019)
Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38
Ranjan, V., Le, H., Hoai, M.: Iterative crowd counting. In: ECCV (2018)
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: CVPR (2017)
Shi, Z., Mettes, P., Snoek, C.G.M.: Counting with focus for free. In: ICCV (2019)
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: ICCV (2017)
Wang, Q., Gao, J., Lin, W., Yuan, Y.: Learning from synthetic data for crowd counting in the wild. In: CVPR (2019)
Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)
Wang, S., Zhao, H., Wang, W., Di, H., Shu, X.: Improving deep crowd density estimation via pre-classification of density. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10636, pp. 260–269. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70090-8_27
Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., Cao, X.: In defense of single-column networks for crowd counting. In: BMVC (2018)
Xie, Y., Lu, Y., Wang, S.: RSANet: deep recurrent scale-aware network for crowd counting. In: ICIP (2020)
Yang, L., Peng, H., Zhang, D., Fu, J., Han, J.: Revisiting anchor mechanisms for temporal action localization. IEEE TIP 29, 8535–8548 (2020)
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: CVPR (2015)
Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. arXiv preprint arXiv:1908.00382 (2019)
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: CVPR (2016)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
Zhou, T., Li, J., Wang, S., Tao, R., Shen, J.: MATNet: motion-attentive transition network for zero-shot video object segmentation. IEEE TIP 29, 8326–8338 (2020)
Zhou, T., Lu, Y., Di, H.: Locality-constrained collaborative model for robust visual tracking. IEEE TCSVT 27(2), 313–325 (2015)
Zhou, T., Lu, Y., Di, H., Zhang, J.: Video object segmentation aggregation. In: ICME (2016)
Zhou, T., Lu, Y., Lv, F., Di, H., Zhao, Q., Zhang, J.: Abrupt motion tracking via nearest neighbor field driven stochastic sampling. Neurocomputing 165, 350–360 (2015)
Zhou, T., Wang, S., Zhou, Y., Yao, Y., Li, J., Shao, L.: Motion-attentive transition for zero-shot video object segmentation. In: AAAI (2020)
Zhou, T., Wang, W., Qi, S., Ling, H., Shen, J.: Cascaded human-object interaction recognition. In: CVPR (2020)
Acknowledgements
This work is supported by Natural Science Foundation of Shanghai under Grant No. 19ZR1455300, and National Natural Science Foundation of China under Grant No. 61806126.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Zhao, H., Zhou, F., Zhang, Q., Shi, Y., Liang, L. (2021). MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-67835-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67834-0
Online ISBN: 978-3-030-67835-7
eBook Packages: Computer ScienceComputer Science (R0)