research-article

Class Gradient Projection For Continual Learning

Authors:

Lianli GaoAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 5575 - 5583

https://doi.org/10.1145/3503161.3548054

Published: 10 October 2022 Publication History

Abstract

Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL). Recent approaches tackle this problem by projecting the gradient update orthogonal to the gradient subspace of existing tasks. While the results are remarkable, those approaches ignore the fact that these calculated gradients are not guaranteed to be orthogonal to the gradient subspace of each class due to the class deviation in tasks, e.g., distinguishing "Man" from "Sea" v.s. differentiating "Boy" from "Girl". Therefore, this strategy may still cause catastrophic forgetting for some classes. In this paper, we propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks. Gradient update orthogonal to the gradient subspace of existing classes can be effectively utilized to minimize interference from other classes. To improve the generalization and efficiency, we further design a Base Refining (BR) algorithm to combine similar classes and refine class bases dynamically. Moreover, we leverage a contrastive learning method to improve the model's ability to handle unseen tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed approach. It improves the previous methods by 2.0% on the CIFAR-100 dataset. The code is available at https://github.com/zackschen/CGP.

References

[1]

Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, and Babak Ehteshami Bejnordi. 2020. Conditional channel gated networks for task-aware continual learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3931--3940.

[2]

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision. 139--154.

Digital Library

[3]

Ali Ayub and Alan R. Wagner. 2021. EEC: Learning to Encode and Regenerate Images for Continual Learning. In Proceedings of the International Conference on Learning Representations.

[4]

Yaroslav Bulatov. 2011. Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html, Vol. 2 (2011).

[5]

Yuanqiang Cai, Dawei Du, Libo Zhang, Longyin Wen, Weiqiang Wang, Yanjun Wu, and Siwei Lyu. 2020. Guided Attention Network for Object Detection and Counting on Drones. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12--16, 2020. 709--717.

[6]

Hyuntak Cha, Jaeho Lee, and Jinwoo Shin. 2021. Co2L: Contrastive Continual Learning. In Proceedings of the International Conference on Computer Vision. 9516--9525.

[7]

Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2018).

[8]

Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet Kumar Dokania, Philip H. S. Torr, and Marc'Aurelio Ranzato. 2019. Continual Learning with Tiny Episodic Memories. CoRR, Vol. abs/1902.10486 (2019).

[9]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International conference on machine learning. 1597--1607.

[10]

Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 15750--15758.

[11]

Matthias Delange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Greg Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[12]

Ruoxi Deng and Shengjun Liu. 2020. Deep Structural Contour Detection. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12--16, 2020. 304--312.

[13]

Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, and Marcus Rohrbach. 2020. Adversarial continual learning. In Proceedings of the European Conference on Computer Vision. 386--402.

Digital Library

[14]

Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. 2020. Orthogonal gradient descent for continual learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 3762--3773.

[15]

Michael Gutmann and Aapo Hyv"arinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 297--304.

[16]

Dan Hendrycks, Norman Mu, Ekin Dogus Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In Proceedings of the International Conference on Learning Representations.

[17]

Ching-Yi Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, and Chu-Song Chen. 2019. Compacting, picking and growing for unforgetting continual learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[18]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 18661--18673.

[19]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114 (2017), 3521--3526.

[20]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[21]

Seung Hyun Lee, Dae Ha Kim, and Byung Cheol Song. 2018. Self-supervised knowledge distillation using singular value decomposition. In Proceedings of the European Conference on Computer Vision. 335--350.

[22]

Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems, Vol. 30 (2017).

[23]

Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH Hoi. 2020b. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020).

[24]

Xinke Li, Chongshou Li, Zekun Tong, Andrew Lim, Junsong Yuan, Yuwei Wu, Jing Tang, and Raymond Huang. 2020a. Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12--16, 2020. 238--246.

Digital Library

[25]

Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, and Caiming Xiong. 2019. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In Proceedings of the International Conference on Machine Learning. 3925--3934.

[26]

Sen Lin, Li Yang, Deliang Fan, and Junshan Zhang. 2022. TRGP: Trust Region Gradient Projection for Continual Learning. CoRR, Vol. abs/2202.02931 (2022).

[27]

David Lopez-Paz and Marc'Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems, Vol. 30 (2017).

[28]

Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. 109--165.

[29]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.

[30]

Cuong V Nguyen, Yingzhen Li, Thang D Bui, and Richard E Turner. 2017. Variational continual learning. arXiv preprint arXiv:1710.10628 (2017).

[31]

Jathushan Rajasegaran, Munawar Hayat, Salman H Khan, Fahad Shahbaz Khan, and Ling Shao. 2019. Random path selection for continual learning. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[32]

Roger Ratcliff. 1990. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, Vol. 97 (1990), 285.

[33]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2001--2010.

[34]

Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. CoRR, Vol. abs/1606.04671 (2016).

[35]

Gobinda Saha, Isha Garg, and Kaushik Roy. 2021. Gradient Projection Memory for Continual Learning. In Proceedings of the International Conference on Learning Representations.

[36]

Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the International Conference on Machine Learning. 4548--4557.

[37]

Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. CoRR, Vol. abs/2001.07685 (2020).

[38]

Pablo Sprechmann, Siddhant M. Jayakumar, Jack W. Rae, Alexander Pritzel, Adrià Puigdomè nech Badia, Benigno Uria, Oriol Vinyals, Demis Hassabis, Razvan Pascanu, and Charles Blundell. 2018. Memory-based Parameter Adaptation. In Proceedings of the International Conference on Learning Representations.

[39]

A"a ron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. CoRR, Vol. abs/1807.03748 (2018).

[40]

Tom Veniat, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2021. Efficient Continual Learning with Modular Networks and Task-Driven Priors. In Proceedings of the International Conference on Learning Representations.

[41]

Xin Wang, Wei Huang, Qi Liu, Yu Yin, Zhenya Huang, Le Wu, Jianhui Ma, and Xue Wang. 2020. Fine-Grained Similarity Measurement between Educational Videos and Exercises. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12--16, 2020. 331--339.

Digital Library

[42]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).

[43]

Jaehong Yoon, Saehoon Kim, Eunho Yang, and Sung Ju Hwang. 2020. Scalable and Order-robust Continual Learning with Additive Parameter Decomposition. In Proceedings of the International Conference on Learning Representations.

[44]

Guanxiong Zeng, Yang Chen, Bo Cui, and Shan Yu. 2019. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, Vol. 1 (2019), 364--372.

[45]

Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning. 3987--3995.

[46]

Ji Zhang, Jingkuan Song, Lianli Gao, Ye Liu, and Heng Tao Shen. 2022. Progressive Meta-learning with Curriculum. IEEE Transactions on Circuits and Systems for Video Technology (2022).

[47]

Ji Zhang, Jingkuan Song, Yazhou Yao, and Lianli Gao. 2021. Curriculum-Based Meta-learning. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 1838--1846.io

Cited By

Yang CLiu WChen SQi JZhou ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Generating Prompts in Latent Space for Rehearsal-free Continual LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681003(8913-8922)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681003
Malashin RMikhalkova M(2024)Avoiding Catastrophic Forgetting Via Neuronal Decay2024 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF)10.1109/WECONF61770.2024.10564665(1-6)Online publication date: 3-Jun-2024
https://doi.org/10.1109/WECONF61770.2024.10564665
Cheng CSong JZhu XZhu JGao LShen HEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)CUCL: Codebook for Unsupervised Continual LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611713(1729-1737)Online publication date: 27-Oct-2023
https://doi.org/10.1145/3581783.3611713
Show More Cited By

Index Terms

Class Gradient Projection For Continual Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

CUCL: Codebook for Unsupervised Continual Learning
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning which needs high-quality manual labeled data. The experiments under UCL paradigm indicate a phenomenon where the results ...
Introducing Common Null Space of Gradients for Gradient Projection Methods in Continual Learning
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Continual learning aims to learn new knowledge from a sequence of tasks without forgetting. Recent studies have found that projecting gradients onto the orthogonal direction of task-specific features is effective. However, these methods mainly focus on ...
Adaptive online continual multi-view learning
Abstract
Deep neural networks (DNNs) have gained great success in information fusion. However, recent studies report that DNNs are suffering from catastrophic forgetting, i.e., DNNs would forget the knowledge learned from previous tasks when training on ...
Highlights
- The paper proposes a new setting for continual multi-view learning.
- A novel multi-view learning method is formulated to handle the distribution shift issue.
- Comprehensive experiments are conducted to verify the method.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Chinese National Science & Technology Pillar Program
the National Natural Science Foundation of China

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
296
Total Downloads

Downloads (Last 12 months)106
Downloads (Last 6 weeks)10

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang CLiu WChen SQi JZhou ACai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Generating Prompts in Latent Space for Rehearsal-free Continual LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681003(8913-8922)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681003
Malashin RMikhalkova M(2024)Avoiding Catastrophic Forgetting Via Neuronal Decay2024 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF)10.1109/WECONF61770.2024.10564665(1-6)Online publication date: 3-Jun-2024
https://doi.org/10.1109/WECONF61770.2024.10564665
Cheng CSong JZhu XZhu JGao LShen HEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)CUCL: Codebook for Unsupervised Continual LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611713(1729-1737)Online publication date: 27-Oct-2023
https://doi.org/10.1145/3581783.3611713
Wang DHuang KChen Q(2023)Progressive Neural Networks for Continuous Classification of Retinal Optical Coherence Tomography Images2023 Eleventh International Conference on Advanced Cloud and Big Data (CBD)10.1109/CBD63341.2023.00036(156-161)Online publication date: 18-Dec-2023
https://doi.org/10.1109/CBD63341.2023.00036

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents