Nothing Special   »   [go: up one dir, main page]

Skip to main content

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)

    Google Scholar 

  2. Ahmed, S.M., Lejbolle, A.R., Panda, R., Roy-Chowdhury, A.K.: Camera on-boarding for person re-identification using hypothesis transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12144–12153 (2020)

    Google Scholar 

  3. Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, PMLR, pp. 6028–6039 (2020)

    Google Scholar 

  4. Ahmed, S.M., Raychaudhuri, D.S., Paul, S., Oymak, S., Roy-Chowdhury, A.K.: Unsupervised multi-source domain adaptation without access to source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10103–10112 (2021)

    Google Scholar 

  5. Perrot, M., Habrard, A.: A theoretical analysis of metric hypothesis transfer learning. In: International Conference on Machine Learning, PMLR, pp. 1708–1717 (2015)

    Google Scholar 

  6. Thoker, F.M., Gall, J.: Cross-modal knowledge distillation for action recognition. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 6–10. IEEE (2019)

    Google Scholar 

  7. Dai, R., Das, S., Bremond, F.: Learning an augmented RGB representation with cross-modal knowledge distillation for action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13053–13064 (2021)

    Google Scholar 

  8. Garcia, N.C., Bargal, S.A., Ablavsky, V., Morerio, P., Murino, V., Sclaroff, S.: Dmcl: distillation multiple choice learning for multimodal action recognition. arXiv preprint arXiv:1912.10982 (2019)

  9. Wang, J., Tang, Z., Li, X., Yu, M., Fang, Q., Liu, L.: Cross-modal knowledge distillation method for automatic cued speech recognition. arXiv preprint arXiv:2106.13686 (2021)

  10. Sayed, N., Brattoli, B., Ommer, B.: Cross and learn: cross-modal self-supervision. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 228–243. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_17

    Chapter  Google Scholar 

  11. Hoffman, J., Gupta, S., Leong, J., Guadarrama, S., Darrell, T.: Cross-modal adaptation for RGB-d detection. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5032–5039. IEEE (2016)

    Google Scholar 

  12. Zhao, L., Peng, X., Chen, Y., Kapadia, M., Metaxas, D.N.: Knowledge as priors: cross-modal knowledge generalization for datasets without superior knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537 (2020)

    Google Scholar 

  13. Ferreri, A., Bucci, S., Tommasi, T.: Translate to adapt: RGB-d scene recognition across domains. arXiv preprint arXiv:2103.14672 (2021)

  14. Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-d scene recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)

    Google Scholar 

  15. Ayub, A., Wagner, A.R.: Centroid based concept learning for RGB-d indoor scene classification. arXiv preprint arXiv:1911.00155 (2019)

  16. Peng, K.C., Wu, Z., Ernst, J.: Zero-shot deep domain adaptation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 764–781 (2018)

    Google Scholar 

  17. Hsu, H.K., et al.: Progressive domain adaptation for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 749–757 (2020)

    Google Scholar 

  18. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)

    Google Scholar 

  19. Paul, S., Tsai, Y.-H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 571–587. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_33

    Chapter  Google Scholar 

  20. Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, PMLR, pp. 1989–1998 (2018)

    Google Scholar 

  21. Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1406–1415 (2019)

    Google Scholar 

  22. Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8978–8987 (2021)

    Google Scholar 

  23. Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Exploiting the intrinsic neighborhood structure for source-free domain adaptation. arXiv preprint arXiv:2110.04202 (2021)

  24. Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Casting a bait for offline and online source-free domain adaptation. arXiv preprint arXiv:2010.12427 (2020)

  25. Agarwal, P., Paudel, D.P., Zaech, J.N., Van Gool, L.: Unsupervised robust domain adaptation without source data. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2009–2018 (2022)

    Google Scholar 

  26. Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8602–8617 (2021)

    Google Scholar 

  27. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)

    MathSciNet  Google Scholar 

  28. Bridle, J.S., Heading, A.J., MacKay, D.J.: Unsupervised classifiers, mutual information and phantom targets (1992)

    Google Scholar 

  29. Kutbi, M., Peng, K.C., Wu, Z.: Zero-shot deep domain adaptation with common representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3909–3924 (2021)

    Google Scholar 

  30. Ioffe, S., Normalization, C.S.B.: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  31. Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724 (2020)

    Google Scholar 

  32. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)

    Google Scholar 

  33. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)

    Google Scholar 

  34. Cho, J., Min, D., Kim, Y., Sohn, K.: Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset. Expert Syst. Appl. 178, 114877 (2021)

    Article  Google Scholar 

  35. Brown, M., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR 2011, pp. 177–184. IEEE (2011)

    Google Scholar 

  36. Kim, Y., Ham, B., Oh, C., Sohn, K.: Structure selective depth superresolution for RGB-D cameras. IEEE Trans. Image Process. 25(11), 5227–5238 (2016)

    Article  MathSciNet  Google Scholar 

  37. Kim, S., Min, D., Ham, B., Kim, S., Sohn, K.: Deep stereo confidence prediction for depth estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp.992–996 IEEE (2017)

    Google Scholar 

  38. Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018)

    Article  MathSciNet  Google Scholar 

  39. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019)

    Google Scholar 

Download references

Acknowledgements

SMA, SL, KCP and MJ were supported by Mitsubishi Electric Research Laboratories. SMA and ARC were partially supported by ONR grant N00014-19-1-2264 and the NSF grants CCF-2008020 and IIS-1724341.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suhas Lohit .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8567 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, S.M., Lohit, S., Peng, KC., Jones, M.J., Roy-Chowdhury, A.K. (2022). Cross-Modal Knowledge Transfer Without Task-Relevant Source Data. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19830-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19829-8

  • Online ISBN: 978-3-031-19830-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics