Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394171.3413698acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Scale-aware Progressive Optimization Network

Published: 12 October 2020 Publication History

Abstract

Crowd counting has attracted increasing attention due to its wide application prospect. One of the most essential challenge in this domain is large scale variation, which impacts the accuracy of density estimation. To this end, we propose a scale-aware progressive optimization network (SPO-Net) for crowd counting, which trains a scale adaptive network to achieve high-quality density map estimation and overcome the variable scale dilemma in highly congested scenes. Concretely, the first phase of SPO-Net, band-pass stage, mainly concentrates on preprocessesing the input image and fusing both high-level semantic information and low-level spatial information from separated multi-layer features. And the second phase of SPO-Net, rolling guidance stage, aims to learn a scale-adapted network from multi-scale features as well as rolling training manner. For better learning local correlation of multi-size regions and reducing redundant calculations, we introduce a progressive optimization strategy. Extensive experiments on three challenging crowd counting datasets not only demonstrate the efficacy of each part in SPO-Net, but also suggest the superiority of our proposed method compared with the state-of-the-art approaches.

Supplementary Material

ZIP File (mmfp1625aux.zip)
In this supplementary document, we provide more details and comparisons about ablation experiments
MP4 File (3394171.3413698.mp4)
We propose a scale-aware progressive optimization network (SPO-Net), which focuses on achieving high-quality density map regression for crowd counting. In response to the biggest challenge scale variation in crowd counting, we propose a rolling structure and a progressive optimization strategy. The rolling structure can extract rich multi-scale features to learn a scale-adapted network, and the progressive optimization strategy can help the network achieve efficient learning. Our rolling structure is only applied in the training phase, so no more parameters or calculations would be added in the testing phase, which is different from the crowd counting methods proposed before. Our method has been tested on three challenging crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF) and extensive experiments demonstrate the efficacy of each part in SPO-Net.

References

[1]
Lokesh Boominathan, Srinivas SS Kruthiventi, and R Venkatesh Babu. 2016. Crowdnet: A deep convolutional network for dense crowd counting. In Proceedings of ACM International Conference on Multimedia. 640--644.
[2]
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale aggregation network for accurate and efficient crowd counting. In Proceedings of European Conference on Computer Vision. 757--773.
[3]
Antoni B Chan and Nuno Vasconcelos. 2009. Bayesian poisson regression for crowd counting. In Proceedings of IEEE International Conference on Computer Vision. 545--551.
[4]
Ke Chen, Chen Change Loy, Shaogang Gong, and Tony Xiang. 2012. Feature mining for localised crowd counting. In Proceedings of British Machine Vision Conference. 1--11.
[5]
Xinya Chen, Yanrui Bin, Nong Sang, and Changxin Gao. 2019. Scale pyramid network for crowd counting. In Proceedings of IEEE Winter Conference on Applications of Computer Vision. 1941--1950.
[6]
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, and Alexander G Hauptmann. 2019. Learning spatial awareness to improve crowd counting. In Proceedings of IEEE International Conference on Computer Vision. 6152--6161.
[7]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 886--893.
[8]
Markus Enzweiler and Dariu M Gavrila. 2008. Monocular pedestrian detection: Survey and experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 12 (2008), 2179--2195.
[9]
Pedro F Felzenszwalb, David McAllester, Deva Ramanan, and Ross B Girshick. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31, 9 (2010), 1627--1645.
[10]
Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. 2013. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2547--2554.
[11]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of European Conference on Computer Vision. 532--546.
[12]
Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 6133--6142.
[13]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[14]
Bastian Leibe, Edgar Seemann, and Bernt Schiele. 2005. Pedestrian detection in crowded scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 878--885.
[15]
Victor Lempitsky and Andrew Zisserman. 2010. Learning to count objects in images. In Advances in Neural Information Processing Systems. 1324--1332.
[16]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1091--1100.
[17]
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019 b. Crowd counting with deep structured scale integration network. In Proceedings of IEEE International Conference on Computer Vision. 1774--1783.
[18]
Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. 2019 a. Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 3225--3234.
[19]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019 c. Context-aware crowd counting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5099--5108.
[20]
Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2019. Bayesian loss for crowd count estimation with point supervision. In Proceedings of the IEEE International Conference on Computer Vision. 6142--6151.
[21]
Tomávs Mikolov, Martin Karafiát, Lukávs Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of Eleventh Annual Conference of the International Speech Communication Association.
[22]
Tomávs Mikolov, Stefan Kombrink, Lukávs Burget, Jan Černock, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. 5528--5531.
[23]
Daniel Onoro-Rubio and Roberto J López-Sastre. 2016. Towards perspective-free object counting with deep learning. In Proceedings of European Conference on Computer Vision. 615--629.
[24]
Viet-Quoc Pham, Tatsuo Kozakaya, Osamu Yamaguchi, and Ryuzo Okada. 2015. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In Proceedings of IEEE International Conference on Computer Vision. 3253--3261.
[25]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention. 234--241.
[26]
David Ryan, Simon Denman, Clinton Fookes, and Sridha Sridharan. 2009. Crowd counting using multiple local features. In Digital Image Computing: Techniques and Applications. 81--88.
[27]
Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. 2017. Switching convolutional neural network for crowd counting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4031--4039.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[29]
Vishwanath A Sindagi and Vishal M Patel. 2017a. Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of International Conference on Advanced Video and Signal Based Surveillance. 1--6.
[30]
Vishwanath A Sindagi and Vishal M Patel. 2017b. Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of International Conference on Computer Vision. 1879--1888.
[31]
Yukun Tian, Yiming Lei, Junping Zhang, and James Z Wang. 2019. Padnet: Pan-density crowd counting. IEEE Transactions on Image Processing, Vol. 29 (2019), 2714--2727.
[32]
Oncel Tuzel, Fatih Porikli, and Peter Meer. 2008. Pedestrian detection via classification on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, 10 (2008), 1713--1727.
[33]
Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International Journal of Computer Vision, Vol. 57, 2 (2004), 137--154.
[34]
Jia Wan and Antoni Chan. 2019. Adaptive density map generation for crowd counting. In Proceedings of the IEEE International Conference on Computer Vision. 1130--1139.
[35]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, Vol. 13, 4 (2004), 600--612.
[36]
Bo Wu and Ram Nevatia. 2007. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. International Journal of Computer Vision, Vol. 75, 2 (2007), 247--266.
[37]
Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, and Chunhua Shen. 2019. From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of IEEE International Conference on Computer Vision. 8362--8371.
[38]
Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, and Xiang Bai. 2019. Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting. In Proceedings of IEEE International Conference on Computer Vision. 8382--8390.
[39]
Zhaoyi Yan, Yuchen Yuan, Wangmeng Zuo, Xiao Tan, Yezhen Wang, Shilei Wen, and Errui Ding. 2019. Perspective-guided convolution networks for crowd counting. In Proceedings of IEEE International Conference on Computer Vision. 952--961.
[40]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 589--597.

Index Terms

  1. Scale-aware Progressive Optimization Network

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowd counting
    2. multi-scale feature
    3. progressive optimization
    4. rolling

    Qualifiers

    • Research-article

    Funding Sources

    • National Key Research and Development Plan in China
    • National Natural Science Foundation of China under Grant
    • The Fundamental Research Funds for the Central Universities

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 129
      Total Downloads
    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media