Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627673.3679801acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Adaptive Cascading Network for Continual Test-Time Adaptation

Published: 21 October 2024 Publication History

Abstract

We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition. Our code is publicly available at https://github.com/Nyquixt/CascadeTTA.

References

[1]
Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3366--3375.
[2]
Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. Metareg: Towards domain generalization using meta-regularization. In NeurIPS. 998--1008.
[3]
Alexander Bartler, Andreas Bühler, Felix Wiewel, Mario Döbler, and Binh Yang. 2021. MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption. In International Conference on Artificial Intelligence and Statistics. https://api.semanticscholar.org/CorpusID:232417890
[4]
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. Advances in neural information processing systems, Vol. 19 (2006).
[5]
Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient Lifelong Learning with A-GEM. In International Conference on Learning Representations.
[6]
Minmin Chen, Zhixiang Xu, Kilian Q Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In ICML. 1627--1634.
[7]
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Alevs Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 7 (2021), 3366--3385.
[8]
Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, and Ben Glocker. 2019. Domain generalization via model-agnostic learning of semantic features. In NeurIPS.
[9]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML.
[10]
Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised Domain Adaptation by Backpropagation. In ICML. 1180--1189.
[11]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[13]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. ICLR (2019).
[14]
Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR (2020).
[15]
Judy Hoffman, Trevor Darrell, and Kate Saenko. 2014. Continuous Manifold Based Adaptation for Evolving Visual Domains. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 867--874. https://api.semanticscholar.org/CorpusID:10105727
[16]
Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, and Zheng-jun Zha. 2022. Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14288--14297.
[17]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In arXiv:1412.6980 [cs.LG].
[18]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114, 13 (2017), 3521--3526.
[19]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[20]
Qicheng Lao, Xiangxi Jiang, Mohammad Havaei, and Yoshua Bengio. 2021. A Two-Stream Continual Learning System With Variational Domain-Agnostic Feature Replay. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33 (2021), 4466--4478. https://api.semanticscholar.org/CorpusID:232113812
[21]
Y. Le and X. Yang. 2015. Tiny ImageNet Visual Recognition Challenge.
[22]
Yann Lecun et al. 1998. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE, Vol. 86, 11 (1998).
[23]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2017. Learning to Generalize: Meta-Learning for Domain Generalization. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:1883787
[24]
Tang Li, Fengchun Qiao, Mengmeng Ma, and Xi Peng. 2023. Are Data-driven Explanations Robust against Out-of-distribution Data?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3821--3831.
[25]
Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adaptive Batch Normalization for practical domain adaptation. Pattern Recognit., Vol. 80 (2018), 109--117.
[26]
Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. 2016. Revisiting Batch Normalization For Practical Domain Adaptation. ArXiv, Vol. abs/1603.04779 (2016). https://api.semanticscholar.org/CorpusID:5069968
[27]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 12 (2017), 2935--2947.
[28]
Jian Liang, Dapeng Hu, and Jiashi Feng. 2020. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning. PMLR, 6028--6039.
[29]
Hong Liu, Mingsheng Long, Jianmin Wang, and Yu Wang. 2020. Learning to Adapt to Evolving Domains. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:227275334
[30]
Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. 2021. Ttt: When does self-supervised test-time training fail or thrive? NeurIPS (2021).
[31]
Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, and Xi Peng. 2021. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2302--2310.
[32]
Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7765--7773.
[33]
Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109--165.
[34]
Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. 2022. Efficient test-time model adaptation without forgetting. In International conference on machine learning. PMLR, 16888--16905.
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[36]
Xi Peng, Fengchun Qiao, and Long Zhao. 2024. Out-of-Domain Generalization From a Single Source: An Uncertainty Quantification Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, 03 (2024), 1775--1787.
[37]
Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd Van Steenkiste, Mehdi SM Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, and Katerina Fragkiadaki. 2023. Test-time adaptation with slot-centric models. In International Conference on Machine Learning. PMLR, 28151--28166.
[38]
Fengchun Qiao and Xi Peng. 2021. Uncertainty-guided Model Generalization to Unseen Domains. In IEEE Conference on Computer Vision and Pattern Recognition.
[39]
Fengchun Qiao and Xi Peng. 2023. Topology-aware Robust Optimization for Out-of-Distribution Generalization. In International Conference on Learning Representations.
[40]
Fengchun Qiao and Xi Peng. 2024. Ensemble Pruning for Out-of-distribution Generalization. In Forty-first International Conference on Machine Learning.
[41]
Fengchun Qiao, Long Zhao, and Xi Peng. 2020. Learning to learn single domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12556--12565.
[42]
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001--2010.
[43]
Mohammad Rostami. 2021. Lifelong domain adaptation via consolidated internal distribution. Advances in Neural Information Processing Systems, Vol. 34 (2021), 11172--11183.
[44]
Jürgen Schmidhuber. 1987. Evolutionary principles in self-referential learning. Ph.,D. Dissertation. Technische Universität München.
[45]
Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, and Matthias Bethge. 2020. Improving robustness against common corruptions by covariate shift adaptation. ArXiv, Vol. abs/2006.16971 (2020). https://api.semanticscholar.org/CorpusID:220266097
[46]
Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, and Xiaogang Wang. 2020. Gradient Regularized Contrastive Learning for Continual Domain Adaptation. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:220793718
[47]
Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. 2020. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 9229--9248. https://proceedings.mlr.press/v119/sun20b.html
[48]
Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. On the importance of initialization and momentum in deep learning. In ICML.
[49]
Vladimir Vapnik. 1998. Statistical learning theory.
[50]
Tom Véniat, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2021. Efficient Continual Learning with Modular Networks and Task-Driven Priors. ArXiv, Vol. abs/2012.12631 (2021).
[51]
Riccardo Volpi, Diane Larlus, and Grégory Rogez. 2021. Continual adaptation of visual representations via domain randomization and meta-learning. In CVPR.
[52]
Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno A. Olshausen, and Trevor Darrell. 2021. Tent: Fully Test-Time Adaptation by Entropy Minimization. In International Conference on Learning Representations. https://api.semanticscholar.org/CorpusID:232278031
[53]
Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. 2022. Continual test-time domain adaptation. In CVPR.
[54]
Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
[55]
Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6382--6388.
[56]
Markus Wulfmeier, Alex Bewley, and Ingmar Posner. 2018. Incremental adversarial domain adaptation for continually changing environments. In 2018 IEEE International conference on robotics and automation (ICRA). IEEE, 4489--4495.
[57]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. ArXiv, Vol. abs/1605.07146 (2016). https://api.semanticscholar.org/CorpusID:15276198
[58]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In ICLR.
[59]
Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. 2023. On Pitfalls of Test-Time Adaptation. arxiv: 2306.03536 [cs.LG]

Index Terms

  1. Adaptive Cascading Network for Continual Test-Time Adaptation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
    October 2024
    5705 pages
    ISBN:9798400704369
    DOI:10.1145/3627673
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. continual test-time adaptation
    2. self-supervised learning
    3. transfer learning

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation through the Faculty Early Career Development Program (NSF CAREER) Award
    • Department of Defense under the Defense Established Program to Stimulate Competitive Research (DoD DEPSCoR) Award

    Conference

    CIKM '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 39
      Total Downloads
    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media