research-article

Adaptive Cascading Network for Continual Test-Time Adaptation

Authors:

Kien X. Nguyen,

Xi PengAuthors Info & Claims

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Pages 1763 - 1773

https://doi.org/10.1145/3627673.3679801

Published: 21 October 2024 Publication History

Abstract

We study the problem of continual test-time adaption where the goal is to adapt a source pre-trained model to a sequence of unlabelled target domains at test time. Existing methods on test-time training suffer from several limitations: (1) Mismatch between the feature extractor and classifier; (2) Interference between the main and self-supervised tasks; (3) Lack of the ability to quickly adapt to the current distribution. In light of these challenges, we propose a cascading paradigm that simultaneously updates the feature extractor and classifier at test time, mitigating the mismatch between them and enabling long-term model adaptation. The pre-training of our model is structured within a meta-learning framework, thereby minimizing the interference between the main and self-supervised tasks and encouraging fast adaptation in the presence of limited unlabelled data. Additionally, we introduce innovative evaluation metrics, average accuracy and forward transfer, to effectively measure the model's adaptation capabilities in dynamic, real-world scenarios. Extensive experiments and ablation studies demonstrate the superiority of our approach in a range of tasks including image classification, text classification, and speech recognition. Our code is publicly available at https://github.com/Nyquixt/CascadeTTA.

References

[1]

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3366--3375.

[2]

Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. 2018. Metareg: Towards domain generalization using meta-regularization. In NeurIPS. 998--1008.

[3]

Alexander Bartler, Andreas Bühler, Felix Wiewel, Mario Döbler, and Binh Yang. 2021. MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption. In International Conference on Artificial Intelligence and Statistics. https://api.semanticscholar.org/CorpusID:232417890

[4]

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. Advances in neural information processing systems, Vol. 19 (2006).

[5]

Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2018. Efficient Lifelong Learning with A-GEM. In International Conference on Learning Representations.

[6]

Minmin Chen, Zhixiang Xu, Kilian Q Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In ICML. 1627--1634.

[7]

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Alevs Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 7 (2021), 3366--3385.

[8]

Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, and Ben Glocker. 2019. Domain generalization via model-agnostic learning of semantic features. In NeurIPS.

[9]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML.

[10]

Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised Domain Adaptation by Backpropagation. In ICML. 1180--1189.

[11]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[13]

Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. ICLR (2019).

[14]

Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. 2020. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. ICLR (2020).

[15]

Judy Hoffman, Trevor Darrell, and Kate Saenko. 2014. Continuous Manifold Based Adaptation for Evolving Visual Domains. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 867--874. https://api.semanticscholar.org/CorpusID:10105727

[16]

Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, and Zheng-jun Zha. 2022. Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14288--14297.

[17]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In arXiv:1412.6980 [cs.LG].

[18]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114, 13 (2017), 3521--3526.

[19]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).

[20]

Qicheng Lao, Xiangxi Jiang, Mohammad Havaei, and Yoshua Bengio. 2021. A Two-Stream Continual Learning System With Variational Domain-Agnostic Feature Replay. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33 (2021), 4466--4478. https://api.semanticscholar.org/CorpusID:232113812

[21]

Y. Le and X. Yang. 2015. Tiny ImageNet Visual Recognition Challenge.

[22]

Yann Lecun et al. 1998. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE, Vol. 86, 11 (1998).

[23]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2017. Learning to Generalize: Meta-Learning for Domain Generalization. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:1883787

[24]

Tang Li, Fengchun Qiao, Mengmeng Ma, and Xi Peng. 2023. Are Data-driven Explanations Robust against Out-of-distribution Data?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3821--3831.

[25]

Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adaptive Batch Normalization for practical domain adaptation. Pattern Recognit., Vol. 80 (2018), 109--117.

[26]

Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, and Xiaodi Hou. 2016. Revisiting Batch Normalization For Practical Domain Adaptation. ArXiv, Vol. abs/1603.04779 (2016). https://api.semanticscholar.org/CorpusID:5069968

[27]

Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 12 (2017), 2935--2947.

[28]

Jian Liang, Dapeng Hu, and Jiashi Feng. 2020. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International Conference on Machine Learning. PMLR, 6028--6039.

[29]

Hong Liu, Mingsheng Long, Jianmin Wang, and Yu Wang. 2020. Learning to Adapt to Evolving Domains. In Neural Information Processing Systems. https://api.semanticscholar.org/CorpusID:227275334

[30]

Yuejiang Liu, Parth Kothari, Bastien van Delft, Baptiste Bellot-Gurlet, Taylor Mordan, and Alexandre Alahi. 2021. Ttt: When does self-supervised test-time training fail or thrive? NeurIPS (2021).

[31]

Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, and Xi Peng. 2021. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2302--2310.

[32]

Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7765--7773.

[33]

Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109--165.

[34]

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. 2022. Efficient test-time model adaptation without forgetting. In International conference on machine learning. PMLR, 16888--16905.

[35]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Digital Library

[36]

Xi Peng, Fengchun Qiao, and Long Zhao. 2024. Out-of-Domain Generalization From a Single Source: An Uncertainty Quantification Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 46, 03 (2024), 1775--1787.

Digital Library

[37]

Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd Van Steenkiste, Mehdi SM Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, and Katerina Fragkiadaki. 2023. Test-time adaptation with slot-centric models. In International Conference on Machine Learning. PMLR, 28151--28166.

[38]

Fengchun Qiao and Xi Peng. 2021. Uncertainty-guided Model Generalization to Unseen Domains. In IEEE Conference on Computer Vision and Pattern Recognition.

[39]

Fengchun Qiao and Xi Peng. 2023. Topology-aware Robust Optimization for Out-of-Distribution Generalization. In International Conference on Learning Representations.

[40]

Fengchun Qiao and Xi Peng. 2024. Ensemble Pruning for Out-of-distribution Generalization. In Forty-first International Conference on Machine Learning.

[41]

Fengchun Qiao, Long Zhao, and Xi Peng. 2020. Learning to learn single domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12556--12565.

[42]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001--2010.

[43]

Mohammad Rostami. 2021. Lifelong domain adaptation via consolidated internal distribution. Advances in Neural Information Processing Systems, Vol. 34 (2021), 11172--11183.

[44]

Jürgen Schmidhuber. 1987. Evolutionary principles in self-referential learning. Ph.,D. Dissertation. Technische Universität München.

[45]

Steffen Schneider, Evgenia Rusak, Luisa Eck, Oliver Bringmann, Wieland Brendel, and Matthias Bethge. 2020. Improving robustness against common corruptions by covariate shift adaptation. ArXiv, Vol. abs/2006.16971 (2020). https://api.semanticscholar.org/CorpusID:220266097

[46]

Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, and Xiaogang Wang. 2020. Gradient Regularized Contrastive Learning for Continual Domain Adaptation. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:220793718

[47]

Yu Sun, Xiaolong Wang, Zhuang Liu, John Miller, Alexei Efros, and Moritz Hardt. 2020. Test-Time Training with Self-Supervision for Generalization under Distribution Shifts. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 9229--9248. https://proceedings.mlr.press/v119/sun20b.html

[48]

Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. On the importance of initialization and momentum in deep learning. In ICML.

Digital Library

[49]

Vladimir Vapnik. 1998. Statistical learning theory.

[50]

Tom Véniat, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2021. Efficient Continual Learning with Modular Networks and Task-Driven Priors. ArXiv, Vol. abs/2012.12631 (2021).

[51]

Riccardo Volpi, Diane Larlus, and Grégory Rogez. 2021. Continual adaptation of visual representations via domain randomization and meta-learning. In CVPR.

[52]

Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno A. Olshausen, and Trevor Darrell. 2021. Tent: Fully Test-Time Adaptation by Entropy Minimization. In International Conference on Learning Representations. https://api.semanticscholar.org/CorpusID:232278031

[53]

Qin Wang, Olga Fink, Luc Van Gool, and Dengxin Dai. 2022. Continual test-time domain adaptation. In CVPR.

[54]

Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).

[55]

Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6382--6388.

[56]

Markus Wulfmeier, Alex Bewley, and Ingmar Posner. 2018. Incremental adversarial domain adaptation for continually changing environments. In 2018 IEEE International conference on robotics and automation (ICRA). IEEE, 4489--4495.

Digital Library

[57]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. ArXiv, Vol. abs/1605.07146 (2016). https://api.semanticscholar.org/CorpusID:15276198

[58]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In ICLR.

[59]

Hao Zhao, Yuejiang Liu, Alexandre Alahi, and Tao Lin. 2023. On Pitfalls of Test-Time Adaptation. arxiv: 2306.03536 [cs.LG]

Index Terms

Adaptive Cascading Network for Continual Test-Time Adaptation
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Uncertainty and Shape-Aware Continual Test-Time Adaptation for Cross-Domain Segmentation of Medical Images
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023
Abstract
Continual test-time adaptation (CTTA) aims to continuously adapt a source-trained model to a target domain with minimal performance loss while assuming no access to the source data. Typically, source models are trained with empirical risk ...
Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
Computer Vision – ECCV 2024
Abstract
Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, ...
Revisiting pretraining for semi-supervised learning in the low-label regime
Abstract
Semi-supervised learning (SSL) addresses the lack of labeled data by exploiting large unlabeled data through pseudolabeling. However, in the extremely low-label regime, pseudo labels could be incorrect, a.k.a. the confirmation bias, and the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

October 2024

5705 pages

ISBN:9798400704369

DOI:10.1145/3627673

General Chairs:
Edoardo Serra
Boise State University, USA
,
Francesca Spezzano
Boise State University, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation through the Faculty Early Career Development Program (NSF CAREER) Award
Department of Defense under the Defense Established Program to Stimulate Competitive Research (DoD DEPSCoR) Award

Conference

CIKM '24

Sponsor:

SIGIR

CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

ID, Boise, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
68
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten