Source-Free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13694))

Included in the following conference series:

European Conference on Computer Vision

2576 Accesses

Abstract

Video-based Unsupervised Domain Adaptation (VUDA) methods improve the robustness of video models, enabling them to be applied to action recognition tasks across different environments. However, these methods require constant access to source data during the adaptation process. Yet in many real-world applications, subjects and scenes in the source video domain should be irrelevant to those in the target video domain. With the increasing emphasis on data privacy, such methods that require source data access would raise serious privacy issues. Therefore, to cope with such concern, a more practical domain adaptation scenario is formulated as the Source-Free Video-based Domain Adaptation (SFVDA). Though there are a few methods for Source-Free Domain Adaptation (SFDA) on image data, these methods yield degenerating performance in SFVDA due to the multi-modality nature of videos, with the existence of additional temporal features. In this paper, we propose a novel Attentive Temporal Consistent Network (ATCoN) to address SFVDA by learning temporal consistency, guaranteed by two novel consistency objectives, namely feature consistency and source prediction consistency, performed across local temporal features. ATCoN further constructs effective overall temporal features by attending to local temporal features based on prediction confidence. Empirical results demonstrate the state-of-the-art performance of ATCoN across various cross-domain action recognition benchmarks. Code is provided at https://github.com/xuyu0010/ATCoN.

This research is jointly supported by A*STAR Singapore under its AME Programmatic Funds (Grant No. A20H6b0151) and Career Development Award (Grant No. C210112046), and by Nanyang Technological University, Singapore, under its NTU Presidential Postdoctoral Fellowship, “Adaptive Multimodal Learning for Robust Sensing and Recognition in Smart Cities” project fund.

Y. Xu and J. Yang—Equal Contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LCMV: Lightweight Classification Module for Video Domain Adaptation

CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video

Shuffle and Attend: Video Domain Adaptation

References

Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Chen, M.H., Kira, Z., AlRegib, G., Yoo, J., Chen, R., Zheng, J.: Temporal attentive alignment for large-scale video domain adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6321–6330 (2019)
Google Scholar
Chen, M.H., Li, B., Bao, Y., AlRegib, G., Kira, Z.: Action segmentation with joint self-supervised temporal domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9454–9463 (2020)
Google Scholar
Choi, J., Sharma, G., Schulter, S., Huang, J.-B.: Shuffle and attend: video domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 678–695. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_40
Chapter Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189. PMLR (2015)
Google Scholar
Ghifary, M., Kleijn, W.B., Zhang, M., Balduzzi, D., Li, W.: Deep reconstruction-classification networks for unsupervised domain adaptation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_36
Chapter Google Scholar
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. Adv. Neural Inf. Process. Syst. 17 (2004)
Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3154–3160 (2017). https://doi.org/10.1109/ICCVW.2017.373
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58 (2011)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset (2017)
Google Scholar
Kim, Y., Cho, D., Han, K., Panda, P., Hong, S.: Domain adaptation without source data. IEEE Trans. Artif. Intell. 2(6), 508–518 (2021). https://doi.org/10.1109/TAI.2021.3110179
Article Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE (2011)
Google Scholar
Kurmi, V.K., Subramanian, V.K., Namboodiri, V.P.: Domain impression: a source data free domain adaptation method. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 615–625 (2021)
Google Scholar
Li, R., Jiao, Q., Cao, W., Wong, H.S., Wu, S.: Model adaptation: unsupervised domain adaptation without source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9641–9650 (2020)
Google Scholar
Li, S., et al.: Semantic concentration for domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9102–9111 (2021)
Google Scholar
Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 6028–6039. PMLR (2020)
Google Scholar
Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8602–8617 (2021)
Google Scholar
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, pp. 97–105. PMLR (2015)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
MATH Google Scholar
Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 502–508 (2019)
Article Google Scholar
Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? arXiv preprint arXiv:1906.02629 (2019)
Pan, B., Cao, Z., Adeli, E., Niebles, J.C.: Adversarial cross-domain action recognition with co-attention. In: AAAI, pp. 11815–11822 (2020)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Google Scholar
Qiu, Z., et al.: Source-free domain adaptation via avatar prototype generation and adaptation. In: International Joint Conference on Artificial Intelligence (2021)
Google Scholar
Saito, K., Kim, D., Sclaroff, S., Darrell, T., Saenko, K.: Semi-supervised domain adaptation via minimax entropy. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8050–8058 (2019)
Google Scholar
Saito, K., Kim, D., Sclaroff, S., Saenko, K.: Universal domain adaptation through self supervision. Adv. Neural Inf. Process. Syst. 33, 16282–16292 (2020)
Google Scholar
Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732 (2018)
Google Scholar
Salimans, T., Kingma, D.P.: Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Adv. Neural Inf. Process. Syst. 29, 901–909 (2016)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Viola, P., Wells, W.M., III.: Alignment by maximization of mutual information. Int. J. Comput. Vision 24(2), 137–154 (1997)
Article Google Scholar
Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2517–2526 (2019)
Google Scholar
Xia, H., Zhao, H., Ding, Z.: Adaptive adversarial network for source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9010–9019 (2021)
Google Scholar
Xie, S., Zheng, Z., Chen, L., Chen, C.: Learning semantic representations for unsupervised domain adaptation. In: International Conference on Machine Learning, pp. 5423–5432. PMLR (2018)
Google Scholar
Xu, Y., Yang, J., Cao, H., Chen, Z., Li, Q., Mao, K.: Partial video domain adaptation with partial adversarial temporal attentive network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9332–9341 (2021)
Google Scholar
Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., See, S.: Aligning correlation information for domain adaptation in action recognition (2021)
Google Scholar
Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., See, S.: ARID: a new dataset for recognizing action in the dark. In: Li, X., Wu, M., Chen, Z., Zhang, L. (eds.) DL-HAR 2021. CCIS, vol. 1370, pp. 70–84. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0575-8_6
Chapter Google Scholar
Xu, Y., et al.: Multi-source video domain adaptation with temporal attentive moment alignment. arXiv preprint arXiv:2109.09964 (2021)
Yang, J., Yang, J., Wang, S., Cao, S., Zou, H., Xie, L.: Advancing imbalanced domain adaptation: cluster-level discrepancy minimization with a comprehensive benchmark. IEEE Trans. Cybern., 1–12 (2021). https://doi.org/10.1109/TCYB.2021.3093888
Yang, J., Zou, H., Zhou, Y., Zeng, Z., Xie, L.: Mind the discriminability: asymmetric adversarial domain adaptation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 589–606. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_35
Chapter Google Scholar
Yang, J., An, W., Wang, S., Zhu, X., Yan, C., Huang, J.: Label-driven reconstruction for domain adaptation in semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 480–498. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_29
Chapter Google Scholar
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Unsupervised domain adaptation without source data by casting a bait. arXiv preprint arXiv:2010.12427 (2020)
Yeh, H.W., Yang, B., Yuen, P.C., Harada, T.: Sofa: source-data-free feature alignment for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 474–483 (2021)
Google Scholar
Zhang, Y., Liu, T., Long, M., Jordan, M.: Bridging theory and algorithm for domain adaptation. In: International Conference on Machine Learning, pp. 7404–7413. PMLR (2019)
Google Scholar
Zhou, B., Andonian, A., Oliva, A., Torralba, A.: Temporal relational reasoning in videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 803–818 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, A*STAR, Singapore, Singapore
Yuecong Xu, Keyu Wu, Min Wu & Zhenghua Chen
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Jianfei Yang & Haozhi Cao

Authors

Yuecong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haozhi Cao
View author publications
You can also search for this author in PubMed Google Scholar
Keyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Min Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghua Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenghua Chen .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 779 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Y., Yang, J., Cao, H., Wu, K., Wu, M., Chen, Z. (2022). Source-Free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-19830-4_9
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19829-8
Online ISBN: 978-3-031-19830-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics