Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting

Jiaqi Gao (高佳琪) ORCID: orcid.org/0000-0003-0910-0801¹,
Jingqi Li (李婧琦)¹,
Hongming Shan (单洪明)^2,3,
Yanyun Qu (曲延云)⁴,
James Z. Wang (王则)⁵,
Fei-Yue Wang (王飞跃)⁶ &
…
Junping Zhang (张军平) ORCID: orcid.org/0000-0002-5924-3360¹

455 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Crowd counting has important applications in public safety and pandemic control. A robust and practical crowd counting system has to be capable of continuously learning with the newly incoming domain data in real-world scenarios instead of fitting one domain only. Off-the-shelf methods have some drawbacks when handling multiple domains: (1) the models will achieve limited performance (even drop dramatically) among old domains after training images from new domains due to the discrepancies in intrinsic data distributions from various domains, which is called catastrophic forgetting; (2) the well-trained model in a specific domain achieves imperfect performance among other unseen domains because of domain shift; (3) it leads to linearly increasing storage overhead, either mixing all the data for training or simply training dozens of separate models for different domains when new ones are available. To overcome these issues, we investigate a new crowd counting task in incremental domain training setting called lifelong crowd counting. Its goal is to alleviate catastrophic forgetting and improve the generalization ability using a single model updated by the incremental domains. Specifically, we propose a self-distillation learning framework as a benchmark (forget less, count better, or FLCB) for lifelong crowd counting, which helps the model leverage previous meaningful knowledge in a sustainable manner for better crowd counting to mitigate the forgetting when new data arrive. A new quantitative metric, normalized Backward Transfer (nBwT), is developed to evaluate the forgetting degree of the model in the lifelong learning process. Extensive experimental results demonstrate the superiority of our proposed benchmark in achieving a low catastrophic forgetting degree and strong generalization ability.

摘要

人群计数在公共安全和流行病控制方面具有重要应用。一个鲁棒且实用的人群计数系统须能够在真实场景中不断学习持续到来的新域数据, 而非仅仅拟合某一单域的数据分布。现有方法在处理多个域的数据时有一些不足之处: (1)由于来自不同域的固有数据分布之间的差异, 模型在训练来自新域的图像数据后在旧域中的性能可能会变得十分有限(甚至急剧下降), 这种现象被称为灾难性遗忘; (2)由于域分布的偏移, 在某一特定域数据中训练好的模型在其他未见域中通常表现不佳; (3)处理多个域的数据通常会导致存储开销的线性增长, 例如混合来自所有域的数据进行训练, 或者是简单地为每一个域的数据单独训练一个模型。为克服这些问题, 我们探索了在域增量式训练设置下一种新的人群计数任务, 即终身人群计数。它的目标是通过使用单个模型持续不断地学习新域数据以减轻灾难性遗忘并提高泛化能力。具体来说, 提出一种自蒸馏学习框架作为终身人群计数的基准模型(forget less, count better, FLCB), 这有助于模型可持续地利用之前学到的有意义的知识来更好地对人数进行估计, 以减少训练新数据后对旧数据的遗忘。此外, 设计了一种新的定量评价指标, 即归一化后向迁移(normalized Backward Transfer, nBwT), 用于评估模型在终身学习过程中的遗忘程度。大量实验结果证明了该模型的优越性, 即较低的灾难性遗忘度和较强的泛化能力。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bai S, He ZQ, Qiao Y, et al., 2020. Adaptive dilated network with self-correction supervision for counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4594–4603. https://doi.org/10.1109/CVPR42600.2020.00465
Belouadah E, Popescu A, 2019. IL2M: class incremental learning with dual memory. Proc IEEE/CVF Int Conf on Computer Vision, p.583–592. https://doi.org/10.1109/ICCV.2019.00067
Boominathan L, Kruthiventi SSS, Babu RV, 2016. Crowd-Net: a deep convolutional network for dense crowd counting. Proc 24^th ACM Int Conf on Multimedia, p.640–644. https://doi.org/10.1145/2964284.2967300
Cao XK, Wang ZP, Zhao YY, et al., 2018. Scale aggregation network for accurate and efficient crowd counting. Proc 15^th European Conf on Computer Vision, p.734–750. https://doi.org/10.1007/978-3-030-01228-1_45
Caron M, Misra I, Mairal J, et al., 2020. Unsupervised learning of visual features by contrasting cluster assignments. Proc 34^th Int Conf on Neural Information Processing Systems, p.9912–9924.
Chan AB, Vasconcelos N, 2009. Bayesian Poisson regression for crowd counting. Proc 12^th IEEE Int Conf on Computer Vision, p.545–551. https://doi.org/10.1109/ICCV.2009.5459191
Chen BH, Yan ZY, Li K, et al., 2021. Variational attention: propagating domain-specific knowledge for multi-domain learning in crowd counting. Proc IEEE/CVF Int Conf on Computer Vision, p.16065–16075. https://doi.org/10.1109/ICCV48922.2021.01576
Chen T, Kornblith S, Norouzi M, et al., 2020. A simple framework for contrastive learning of visual representations. Proc 37^th Int Conf on Machine Learning, p.1597–1607.
Chen XY, Bin YR, Sang N, et al., 2019. Scale pyramid network for crowd counting. Proc IEEE Winter Conf on Applications of Computer Vision, p.1941–1950. https://doi.org/10.1109/WACV.2019.00211
Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177
Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155
Article Google Scholar
Grill JB, Strub F, Altché F, et al., 2020. Bootstrap your own latent a new approach to self-supervised learning. Proc 34^th Int Conf on Neural Information Processing Systems, p.21271–21284.
Guo D, Li K, Zha ZJ, et al., 2019. DADNet: dilated-attention-deformable ConvNet for crowd counting. Proc 27^th ACM Int Conf on Multimedia, p.1823–1832. https://doi.org/10.1145/3343031.3350881
Han T, Gao JY, Yuan Y, et al., 2020. Focus on semantic consistency for cross-domain crowd understanding. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1848–1852. https://doi.org/10.1109/ICASSP40776.2020.9054768
He KM, Fan HQ, Wu YX, et al., 2020. Momentum contrast for unsupervised visual representation learning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9729–9738. https://doi.org/10.1109/CVPR42600.2020.00975
He YJ, Sick B, 2021. CLeaR: an adaptive continual learning framework for regression tasks. AI Persp, 3(1):2. https://doi.org/10.1186/S42467-021-00009-8
Article Google Scholar
Huang ZZ, Chen J, Zhang JP, et al., 2022. Learning representation for clustering via prototype scattering and positive sampling. IEEE Trans Patt Anal Mach Intell, early access. https://doi.org/10.1109/TPAMI.2022.3216454
Idrees H, Tayyab M, Athrey K, et al., 2018. Composition loss for counting, density map estimation and localization in dense crowds. Proc 15^th European Conf on Computer Vision, p.532–546. https://doi.org/10.1007/978-3-030-01216-8_33
Jiang SQ, Lu XB, Lei YJ, et al., 2020. Mask-aware networks for crowd counting. IEEE Trans Circ Syst Video Technol, 30(9):3119–3129. https://doi.org/10.1109/TCSVT.2019.2934989
Article Google Scholar
Jiang XH, Zhang L, Xu ML, et al., 2020a. Attention scaling for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4706–4715. https://doi.org/10.1109/CVPR42600.2020.00476
Jiang XH, Zhang L, Lv P, et al., 2020b. Learning multi-level density maps for crowd counting. IEEE Trans Neur Netw Learn Syst, 31(8):2705–2715. https://doi.org/10.1109/TNNLS.2019.2933920
Article Google Scholar
Kirkpatrick J, Pascanu R, Rabinowitz N, et al., 2017. Overcoming catastrophic forgetting in neural networks. PNAS, 114(13):3521–3526. https://doi.org/10.1073/pnas.1611835114
Article MathSciNet MATH Google Scholar
Leibe B, Seemann E, Schiele B, 2005. Pedestrian detection in crowded scenes. Proc IEEE/CVF Computer Society Conf on Computer Vision and Pattern Recognition, p.878–885. https://doi.org/10.1109/CVPR.2005.272
Li YH, Zhang XF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1091–1100. https://doi.org/10.1109/CVPR.2018.00120
Li ZZ, Hoiem D, 2018. Learning without forgetting. IEEE Trans Patt Anal Mach Intell, 40(12):2935–2947. https://doi.org/10.1109/TPAMI.2017.2773081
Article Google Scholar
Liu L, Lu H, Xiong HP, et al., 2020. Counting objects by blockwise classification. IEEE Trans Circ Syst Video Technol, 30(10):3513–3527. https://doi.org/10.1109/TCSVT.2019.2942970
Article Google Scholar
Liu LB, Qiu ZL, Li GB, et al., 2019. Crowd counting with deep structured scale integration network. Proc IEEE/CVF Int Conf on Computer Vision, p.1774–1783. https://doi.org/10.1109/ICCV.2019.00186
Liu LB, Chen JQ, Wu HF, et al., 2021. Cross-modal collaborative representation learning and a large-scale RGBT benchmark for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4823–4833. https://doi.org/10.1109/CVPR46437.2021.00479
Liu N, Long YC, Zou CQ, et al., 2019. ADCrowdNet: an attention-injective deformable convolutional network for crowd understanding. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3225–3234. https://doi.org/10.1109/CVPR.2019.00334
Liu WZ, Salzmann M, Fua P, 2019. Context-aware crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5099–5108. https://doi.org/10.1109/CVPR.2019.00524
Liu WZ, Durasov N, Fua P, 2022. Leveraging self-supervision for cross-domain crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5341–5352. https://doi.org/10.1109/CVPR52688.2022.00527
Lopez-Paz D, Ranzato M, 2017. Gradient episodic memory for continual learning. Proc 31^st Int Conf on Neural Information Processing Systems, p.6467–6476.
Lowe DG, 1999. Object recognition from local scale-invariant features. Proc 7^th IEEE Int Conf on Computer Vision, p.1150–1157. https://doi.org/10.1109/ICCV.1999.790410
Luo A, Yang F, Li X, et al., 2020. Hybrid graph neural networks for crowd counting. Proc 34^th AAAI Conf on Artificial Intelligence, p.11693–11700. https://doi.org/10.1609/aaai.v34i07.6839
Ma ZH, Wei X, Hong XP, et al., 2019. Bayesian loss for crowd count estimation with point supervision. Proc IEEE/CVF Int Conf on Computer Vision, p.6142–6151. https://doi.org/10.1109/ICCV.2019.00624
Ma ZH, Wei X, Hong XP, et al., 2020. Learning scales from points: a scale-aware probabilistic model for crowd counting. Proc 28^th ACM Int Conf on Multimedia, p.220–228. https://doi.org/10.1145/3394171.3413642
Ma ZH, Hong XP, Wei X, et al., 2021. Towards a universal model for cross-dataset crowd counting. Proc IEEE/CVF Int Conf on Computer Vision, p.3205–3214. https://doi.org/10.1109/ICCV48922.2021.00319
Niu C, Wang G, 2022a. Self-supervised representation learning with MUlti-Segmental Informational Coding (MUSIC). https://arxiv.org/abs/2206.06461
Niu C, Wang G, 2022b. Unsupervised contrastive learning based transformer for lung nodule detection. Phys Med Biol, 67(20):204001. https://doi.org/10.1088/1361-6560/ac92ba
Article Google Scholar
Niu C, Li MZ, Fan FL, et al., 2020. Suppression of correlated noise with similarity-based unsupervised deep learning. https://arxiv.org/abs/2011.03384
Niu C, Shan HM, Wang G, 2022. SPICE: semantic pseudo-labeling for image clustering. IEEE Trans Image Process, 31:7264–7278. https://doi.org/10.1109/TIP.2022.3221290
Article Google Scholar
Rebuffi SA, Kolesnikov A, Sperl G, et al., 2017. iCaRL: incremental classifier and representation learning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2001–2010. https://doi.org/10.1109/CVPR.2017.587
Rusu AA, Rabinowitz NC, Desjardins G, et al., 2016. Progressive neural networks. https://arxiv.org/abs/1606.04671
Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5744–5752. https://doi.org/10.1109/CVPR.2017.429
Shi ZL, Mettes P, Snoek C, 2019. Counting with focus for free. Proc IEEE/CVF Int Conf on Computer Vision, p.4200–4209. https://doi.org/10.1109/ICCV.2019.00430
Sindagi VA, Patel VM, 2017. Generating high-quality crowd density maps using contextual pyramid CNNs. Proc IEEE Int Conf on Computer Vision, p.1861–1870. https://doi.org/10.1109/ICCV.2017.206
Sindagi VA, Patel VM, 2020. HA-CCN: hierarchical attention-based crowd counting network. IEEE Trans Image Process, 29:323–335. https://doi.org/10.1109/TIP.2019.2928634
Article MathSciNet MATH Google Scholar
Sindagi V, Yasarla R, Patel V, 2019. Pushing the frontiers of unconstrained crowd counting: new dataset and benchmark method. Proc IEEE/CVF Int Conf on Computer Vision, p.1221–1231. https://doi.org/10.1109/ICCV.2019.00131
Song QY, Wang CA, Wang YB, et al., 2021. To choose or to fuse? Scale selection for crowd counting. Proc 35^th AAAI Conf on Artificial Intelligence, p.2576–2583. https://doi.org/10.1609/aaai.v35i3.16360
Tan X, Tao C, Ren TW, et al., 2019. Crowd counting via multi-layer regression. Proc 27^th ACM Int Conf on Multimedia, p.1907–1915. https://doi.org/10.1145/3343031.3350914
Tian YK, Lei YM, Zhang JP, et al., 2020. PaDNet: pandensity crowd counting. IEEE Trans Image Process, 29:2714–2727. https://doi.org/10.1109/TIP.2019.2952083
Article MATH Google Scholar
Tuzel O, Porikli F, Meer P, 2008. Pedestrian detection via classification on Riemannian manifolds. IEEE Trans Patt Anal Mach Intell, 30(10):1713–1727. https://doi.org/10.1109/TPAMI.2008.75
Article Google Scholar
Wang BY, Liu HD, Samaras D, et al., 2020. Distribution matching for crowd counting. Proc 34^th Int Conf on Neural Information Processing Systems, p.1595–1607.
Wang C, Zhang H, Yang L, et al., 2015. Deep people counting in extremely dense crowds. Proc 23^rd ACM Int Conf on Multimedia, p.1299–1302. https://doi.org/10.1145/2733373.2806337
Wang Q, Gao JY, Lin W, et al., 2019. Learning from synthetic data for crowd counting in the wild. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8198–8207. https://doi.org/10.1109/CVPR.2019.00839
Wang Q, Gao JY, Lin W, et al., 2021. NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Patt Anal Mach Intell, 43(6):2141–2149. https://doi.org/10.1109/TPAMI.2020.3013269
Article Google Scholar
Wang Q, Han T, Gao JY, et al., 2022. Neuron linear transformation: modeling the domain shift for crowd counting. IEEE Trans Neur Netw Learn Syst, 33(8):3238–3250. https://doi.org/10.1109/TNNLS.2021.3051371
Article Google Scholar
Wu QQ, Wan J, Chan AB, 2021. Dynamic momentum adaptation for zero-shot cross-domain crowd counting. Proc 29^th ACM Int Conf on Multimedia, p.658–666. https://doi.org/10.1145/3474085.3475230
Xiong HP, Lu H, Liu CX, et al., 2019. From open set to closed set: counting objects by spatial divide-and-conquer. Proc IEEE/CVF Int Conf on Computer Vision, p.8362–8371. https://doi.org/10.1109/ICCV.2019.00845
Yan ZY, Li PY, Wang B, et al., 2021. Towards learning multi-domain crowd counting. IEEE Trans Circ Syst Video Technol, early access. https://doi.org/10.1109/TCSVT.2021.3137593
Yang YF, Li GR, Wu Z, et al., 2020. Reverse perspective network for perspective-aware object counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4374–4383. https://doi.org/10.1109/CVPR42600.2020.00443
Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.833–841. https://doi.org/10.1109/CVPR.2015.7298684
Zhang Q, Lin W, Chan AB, 2021. Cross-view cross-scene multi-view crowd counting. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.557–567. https://doi.org/10.1109/CVPR46437.2021.00062
Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.589–597. https://doi.org/10.1109/CVPR.2016.70
Zhao MM, Zhang CY, Zhang J, et al., 2020. Scale-aware crowd counting via depth-embedded convolutional neural networks. IEEE Trans Circ Syst Video Technol, 30(10):3651–3662. https://doi.org/10.1109/TCSVT.2019.2943010
Article MathSciNet Google Scholar
Zhu JY, Park T, Isola P, et al., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proc IEEE Int Conf on Computer Vision, p.2223–2232. https://doi.org/10.1109/ICCV.2017.244
Zhu L, Zhao ZJ, Lu C, et al., 2019. Dual path multi-scale fusion networks with attention for crowd counting. https://arxiv.org/abs/1902.01115
Zou ZK, Qu XY, Zhou P, et al., 2021. Coarse to fine: domain adaptive crowd counting via adversarial scoring network. Proc 29^th ACM Int Conf on Multimedia, p.2185–2194. https://doi.org/10.1145/3474085.3475377

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, 200433, China
Jiaqi Gao (高佳琪), Jingqi Li (李婧琦) & Junping Zhang (张军平)
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China
Hongming Shan (单洪明)
Shanghai Center for Brain Science and Brain-inspired Technology, Shanghai, 201210, China
Hongming Shan (单洪明)
School of Information Science and Technology, Xiamen University, Xiamen, 361005, China
Yanyun Qu (曲延云)
College of Information Sciences and Technology, the Pennsylvania State University, University Park, PA, 16802, USA
James Z. Wang (王则)
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Fei-Yue Wang (王飞跃)

Authors

Jiaqi Gao (高佳琪)
View author publications
You can also search for this author in PubMed Google Scholar
Jingqi Li (李婧琦)
View author publications
You can also search for this author in PubMed Google Scholar
Hongming Shan (单洪明)
View author publications
You can also search for this author in PubMed Google Scholar
Yanyun Qu (曲延云)
View author publications
You can also search for this author in PubMed Google Scholar
James Z. Wang (王则)
View author publications
You can also search for this author in PubMed Google Scholar
Fei-Yue Wang (王飞跃)
View author publications
You can also search for this author in PubMed Google Scholar
Junping Zhang (张军平)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fei-Yue Wang (王飞跃) or Junping Zhang (张军平).

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 62176059, 62101136, and U1811463), the Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01), Zhangjiang Lab, the Shanghai Municipal of Science and Technology Project (No. 20JC1419500), the Shanghai Sailing Program (No. 21YF1402800), the Natural Science Foundation of Shanghai (No. 21ZR1403600), and the Shanghai Center for Brain Science and Brain-inspired Technology

Contributors

Jiaqi GAO designed the research and drafted the paper. Jingqi LI contributed ideas for experiments and analysis. Jingqi LI, Hongming SHAN, Yanyun QU, James Z. WANG, Fei-Yue WANG, and Junping ZHANG helped organize and revised the paper. Jiaqi GAO, Hongming SHAN, and Junping ZHANG finalized the paper.

Compliance with ethics guidelines

Jiaqi GAO, Jingqi LI, Hongming SHAN, Yanyun QU, James Z. WANG, Fei-Yue WANG, and Junping ZHANG declare that they have no conflict of interest.

Data availability

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

List of supplementary materials

1 Domain concept and gaps of different datasets

2 Effect of different training orders

Fig. S1 Data distributions of four benchmark datasets Table S1 Forgetting degree comparison results with different training orders

Table S2 Generalization comparison results with different training orders on the unseen JHU-Crowd++ dataset

Supplementary materials

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, J., Li, J., Shan, H. et al. Forget less, count better: a domain-incremental self-distillation learning benchmark for lifelong crowd counting. Front Inform Technol Electron Eng 24, 187–202 (2023). https://doi.org/10.1631/FITEE.2200380

Download citation

Received: 07 September 2022
Accepted: 26 December 2022
Published: 01 March 2023
Issue Date: February 2023
DOI: https://doi.org/10.1631/FITEE.2200380

Key words

关键词

CLC number

TP391