Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13694))

Included in the following conference series:

European Conference on Computer Vision

2309 Accesses

Abstract

Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Less Labels, More Modalities: A Self-Training Framework to Reuse Pretrained Networks

Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

References

Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)
Google Scholar
Ahmed, S.M., Lejbolle, A.R., Panda, R., Roy-Chowdhury, A.K.: Camera on-boarding for person re-identification using hypothesis transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12144–12153 (2020)
Google Scholar
Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, PMLR, pp. 6028–6039 (2020)
Google Scholar
Ahmed, S.M., Raychaudhuri, D.S., Paul, S., Oymak, S., Roy-Chowdhury, A.K.: Unsupervised multi-source domain adaptation without access to source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10103–10112 (2021)
Google Scholar
Perrot, M., Habrard, A.: A theoretical analysis of metric hypothesis transfer learning. In: International Conference on Machine Learning, PMLR, pp. 1708–1717 (2015)
Google Scholar
Thoker, F.M., Gall, J.: Cross-modal knowledge distillation for action recognition. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 6–10. IEEE (2019)
Google Scholar
Dai, R., Das, S., Bremond, F.: Learning an augmented RGB representation with cross-modal knowledge distillation for action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13053–13064 (2021)
Google Scholar
Garcia, N.C., Bargal, S.A., Ablavsky, V., Morerio, P., Murino, V., Sclaroff, S.: Dmcl: distillation multiple choice learning for multimodal action recognition. arXiv preprint arXiv:1912.10982 (2019)
Wang, J., Tang, Z., Li, X., Yu, M., Fang, Q., Liu, L.: Cross-modal knowledge distillation method for automatic cued speech recognition. arXiv preprint arXiv:2106.13686 (2021)
Sayed, N., Brattoli, B., Ommer, B.: Cross and learn: cross-modal self-supervision. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 228–243. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_17
Chapter Google Scholar
Hoffman, J., Gupta, S., Leong, J., Guadarrama, S., Darrell, T.: Cross-modal adaptation for RGB-d detection. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5032–5039. IEEE (2016)
Google Scholar
Zhao, L., Peng, X., Chen, Y., Kapadia, M., Metaxas, D.N.: Knowledge as priors: cross-modal knowledge generalization for datasets without superior knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537 (2020)
Google Scholar
Ferreri, A., Bucci, S., Tommasi, T.: Translate to adapt: RGB-d scene recognition across domains. arXiv preprint arXiv:2103.14672 (2021)
Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-d scene recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)
Google Scholar
Ayub, A., Wagner, A.R.: Centroid based concept learning for RGB-d indoor scene classification. arXiv preprint arXiv:1911.00155 (2019)
Peng, K.C., Wu, Z., Ernst, J.: Zero-shot deep domain adaptation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 764–781 (2018)
Google Scholar
Hsu, H.K., et al.: Progressive domain adaptation for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 749–757 (2020)
Google Scholar
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)
Google Scholar
Paul, S., Tsai, Y.-H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 571–587. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_33
Chapter Google Scholar
Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, PMLR, pp. 1989–1998 (2018)
Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1406–1415 (2019)
Google Scholar
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8978–8987 (2021)
Google Scholar
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Exploiting the intrinsic neighborhood structure for source-free domain adaptation. arXiv preprint arXiv:2110.04202 (2021)
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Casting a bait for offline and online source-free domain adaptation. arXiv preprint arXiv:2010.12427 (2020)
Agarwal, P., Paudel, D.P., Zaech, J.N., Van Gool, L.: Unsupervised robust domain adaptation without source data. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2009–2018 (2022)
Google Scholar
Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8602–8617 (2021)
Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
MathSciNet Google Scholar
Bridle, J.S., Heading, A.J., MacKay, D.J.: Unsupervised classifiers, mutual information and phantom targets (1992)
Google Scholar
Kutbi, M., Peng, K.C., Wu, Z.: Zero-shot deep domain adaptation with common representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3909–3924 (2021)
Google Scholar
Ioffe, S., Normalization, C.S.B.: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724 (2020)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)
Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Cho, J., Min, D., Kim, Y., Sohn, K.: Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset. Expert Syst. Appl. 178, 114877 (2021)
Article Google Scholar
Brown, M., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR 2011, pp. 177–184. IEEE (2011)
Google Scholar
Kim, Y., Ham, B., Oh, C., Sohn, K.: Structure selective depth superresolution for RGB-D cameras. IEEE Trans. Image Process. 25(11), 5227–5238 (2016)
Article MathSciNet Google Scholar
Kim, S., Min, D., Ham, B., Kim, S., Sohn, K.: Deep stereo confidence prediction for depth estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp.992–996 IEEE (2017)
Google Scholar
Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018)
Article MathSciNet Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019)
Google Scholar

Download references

Acknowledgements

SMA, SL, KCP and MJ were supported by Mitsubishi Electric Research Laboratories. SMA and ARC were partially supported by ONR grant N00014-19-1-2264 and the NSF grants CCF-2008020 and IIS-1724341.

Author information

Authors and Affiliations

University of California, Riverside, CA, 92507, USA
Sk Miraj Ahmed & Amit K. Roy-Chowdhury
Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, 02139, USA
Suhas Lohit, Kuan-Chuan Peng & Michael J. Jones

Authors

Sk Miraj Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Suhas Lohit
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Chuan Peng
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Jones
View author publications
You can also search for this author in PubMed Google Scholar
Amit K. Roy-Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suhas Lohit .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8567 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, S.M., Lohit, S., Peng, KC., Jones, M.J., Roy-Chowdhury, A.K. (2022). Cross-Modal Knowledge Transfer Without Task-Relevant Source Data. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-19830-4_7
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19829-8
Online ISBN: 978-3-031-19830-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Less Labels, More Modalities: A Self-Training Framework to Reuse Pretrained Networks

Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8567 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Less Labels, More Modalities: A Self-Training Framework to Reuse Pretrained Networks

Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8567 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation