Video-Based Crowd Counting Using a Multi-scale Optical Flow Pyramid Network

Mohammad Asiful Hossain¹²,
Kevin Cannons¹²,
Daesik Jang¹²,
Fabio Cuzzolin¹³ &
…
Zhan Xu¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12626))

Included in the following conference series:

Asian Conference on Computer Vision

914 Accesses
4 Citations

Abstract

This paper presents a novel approach to the task of video-based crowd counting, which can be formalized as the regression problem of learning a mapping from an input image to an output crowd density map. Convolutional neural networks (CNNs) have demonstrated striking accuracy gains in a range of computer vision tasks, including crowd counting. However, the dominant focus within the crowd counting literature has been on the single-frame case or applying CNNs to videos in a frame-by-frame fashion without leveraging motion information. This paper proposes a novel architecture that exploits the spatiotemporal information captured in a video stream by combining an optical flow pyramid with an appearance-based CNN. Extensive empirical evaluation on five public datasets comparing against numerous state-of-the-art approaches demonstrates the efficacy of the proposed architecture, with our methods reporting best results on all datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Estimating People Flows to Better Count Them in Crowded Scenes

Multi-feature-based crowd video modeling for visual event detection

Article 04 April 2020

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

References

Sindagi, V.A., Patel, V.M.: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recogn. Lett. 107, 3–16 (2018)
Article Google Scholar
Ge, W., Collins, R.T.: Marked point processes for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2913–2920 (2009)
Google Scholar
Idrees, H., Soomro, K., Shah, M.: Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1986–1998 (2015)
Article Google Scholar
Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1522–1530 (2017)
Google Scholar
Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1861–1870 (2017)
Google Scholar
Sam, D.B., Surya, S., Babu, R.V.: Switching convolutional neural network for crowd counting. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4031–4039. IEEE (2017)
Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1091–1100 (2018)
Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 589–597 (2016)
Google Scholar
Fang, Y., Zhan, B., Cai, W., Gao, S., Hu, B.: Locality-constrained spatial transformer network for video crowd counting. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 814–819. IEEE (2019)
Google Scholar
Zhang, S., Wu, G., Costeira, J.P., Moura, J.M.: FCN-rLSTM: deep spatio-temporal neural networks for vehicle counting in city cameras. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3667–3676 (2017)
Google Scholar
Xiong, F., Shi, X., Yeung, D.Y.: Spatiotemporal modeling for crowd counting in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5151–5159 (2017)
Google Scholar
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 833–841 (2015)
Google Scholar
Oñoro-Rubio, D., López-Sastre, R.J.: Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 615–629. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_38
Chapter Google Scholar
Boominathan, L., Kruthiventi, S.S., Babu, R.V.: CrowdNet: a deep convolutional network for dense crowd counting. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 640–644. ACM (2016)
Google Scholar
Sindagi, V.A., Patel, V.M.: CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. ACM (2016)
Google Scholar
Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1647–1655 (2017)
Google Scholar
Chan, A.B., Liang, Z.S.J., Vasconcelos, N.: Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Google Scholar
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference, pp. 1–7 (2012)
Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2554 (2013)
Google Scholar
Hossain, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1280–1288. IEEE (2019)
Google Scholar
Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., Shen, C.: From open set to closed set: counting objects by spatial divide-and-conquer. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 952–961 (2019)
Google Scholar
Zhao, Z., Li, H., Zhao, R., Wang, X.: Crossing-line crowd counting with two-phase deep neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 712–726. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_43
Chapter Google Scholar
Ma, Z., Chan, A.B.: Crossing the line: crowd counting by integer programming with local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2539–2546 (2013)
Google Scholar
Ma, Z., Chan, A.B.: Counting people crossing a line using integer programming and local features. IEEE Trans. Circuits Syst. Video Technol. 26, 1955–1969 (2015)
Article Google Scholar
Cong, Y., Gong, H., Zhu, S.C., Tang, Y.: Flow mosaicking: real-time pedestrian counting without scene-specific learning. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1093–1100. IEEE (2009)
Google Scholar
Cao, L., Zhang, X., Ren, W., Huang, K.: Large scale crowd analysis based on convolutional neural network. Pattern Recogn. 48, 3016–3024 (2015)
Article Google Scholar
Fujisawa, S., Hasegawa, G., Taniguchi, Y., Nakano, H.: Pedestrian counting in video sequences based on optical flow clustering. Int. J. Image Process. 7, 1–16 (2013)
Article Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4733 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 744–759. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_45
Chapter Google Scholar
Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1961–1970 (2016)
Google Scholar
Singh, G., Saha, S., Sapienza, M., Torr, P., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3657–3666 (2017)
Google Scholar
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4141–4150 (2017)
Google Scholar
Li, M., Sun, L., Huo, Q.: Flow-guided feature propagation with occlusion aware detail enhancement for hand segmentation in egocentric videos. Comput. Vis. Image Underst. 187, 102785 (2019)
Article Google Scholar
Gadde, R., Jampani, V., Gehler, P.V.: Semantic video CNNs through representation warping. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4463–4472 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2014)
Google Scholar
Liu, L., Wang, H., Li, G., Ouyang, W., Lin, L.: Crowd counting using deep recurrent spatial-aware network. arXiv preprint arXiv:1807.00601 (2018)
Idrees, H., et al.: Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 532–546 (2018)
Google Scholar
Liu, W., Salzmann, M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Wan, J., Chan, A.: Adaptive density map generation for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., Srinivasan, M.: Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3618–3626 (2018)
Google Scholar
Shi, Z., et al.: Crowd counting with deep negative correlation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5382–5390 (2018)
Google Scholar
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
Google Scholar
Jiang, X., et al.: Crowd counting and density estimation by trellis encoder-decoder networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6133–6142 (2018)
Google Scholar
Sindagi, V.A., Patel, V.M.: Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3253–3261 (2015)
Google Scholar
Sheng, B., Shen, C., Lin, G., Li, J., Yang, W., Sun, C.: Crowd counting via weighted VLAD on a dense attribute feature map. IEEE Trans. Circuits Syst. Video Technol. 28, 1788–1797 (2016)
Article Google Scholar
Xu, B., Qiu, G.: Crowd density estimation based on rich features and random projection forest. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)
Google Scholar
Change Loy, C., Gong, S., Xiang, T.: From semi-supervised to transfer counting of crowds. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2256–2263 (2013)
Google Scholar
Yu, K., Tresp, V., Schwaighofer, A.: Learning Gaussian processes from multiple tasks. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), pp. 1012–1019 (2005)
Google Scholar
Liu, B., Vasconcelos, N.: Bayesian model adaptation for crowd counts. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4175–4183 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Huawei Technologies Canada Co., Ltd., Burnaby, Canada
Mohammad Asiful Hossain, Kevin Cannons, Daesik Jang & Zhan Xu
Oxford Brookes University, Oxford, UK
Fabio Cuzzolin

Authors

Mohammad Asiful Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Cannons
View author publications
You can also search for this author in PubMed Google Scholar
Daesik Jang
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Cuzzolin
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Asiful Hossain .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 40147 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hossain, M.A., Cannons, K., Jang, D., Cuzzolin, F., Xu, Z. (2021). Video-Based Crowd Counting Using a Multi-scale Optical Flow Pyramid Network. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12626. Springer, Cham. https://doi.org/10.1007/978-3-030-69541-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-69541-5_1
Published: 26 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69540-8
Online ISBN: 978-3-030-69541-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video-Based Crowd Counting Using a Multi-scale Optical Flow Pyramid Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Estimating People Flows to Better Count Them in Crowded Scenes

Multi-feature-based crowd video modeling for visual event detection

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Video-Based Crowd Counting Using a Multi-scale Optical Flow Pyramid Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Estimating People Flows to Better Count Them in Crowded Scenes

Multi-feature-based crowd video modeling for visual event detection

DRENet: Giving Full Scope to Detection and Regression-Based Estimation for Video Crowd Counting

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation