Abstract
Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gupta, S., Hoffman, J., Malik, J.: Cross modal distillation for supervision transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2827–2836 (2016)
Ahmed, S.M., Lejbolle, A.R., Panda, R., Roy-Chowdhury, A.K.: Camera on-boarding for person re-identification using hypothesis transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12144–12153 (2020)
Liang, J., Hu, D., Feng, J.: Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: International Conference on Machine Learning, PMLR, pp. 6028–6039 (2020)
Ahmed, S.M., Raychaudhuri, D.S., Paul, S., Oymak, S., Roy-Chowdhury, A.K.: Unsupervised multi-source domain adaptation without access to source data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10103–10112 (2021)
Perrot, M., Habrard, A.: A theoretical analysis of metric hypothesis transfer learning. In: International Conference on Machine Learning, PMLR, pp. 1708–1717 (2015)
Thoker, F.M., Gall, J.: Cross-modal knowledge distillation for action recognition. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 6–10. IEEE (2019)
Dai, R., Das, S., Bremond, F.: Learning an augmented RGB representation with cross-modal knowledge distillation for action detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13053–13064 (2021)
Garcia, N.C., Bargal, S.A., Ablavsky, V., Morerio, P., Murino, V., Sclaroff, S.: Dmcl: distillation multiple choice learning for multimodal action recognition. arXiv preprint arXiv:1912.10982 (2019)
Wang, J., Tang, Z., Li, X., Yu, M., Fang, Q., Liu, L.: Cross-modal knowledge distillation method for automatic cued speech recognition. arXiv preprint arXiv:2106.13686 (2021)
Sayed, N., Brattoli, B., Ommer, B.: Cross and learn: cross-modal self-supervision. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 228–243. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_17
Hoffman, J., Gupta, S., Leong, J., Guadarrama, S., Darrell, T.: Cross-modal adaptation for RGB-d detection. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 5032–5039. IEEE (2016)
Zhao, L., Peng, X., Chen, Y., Kapadia, M., Metaxas, D.N.: Knowledge as priors: cross-modal knowledge generalization for datasets without superior knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537 (2020)
Ferreri, A., Bucci, S., Tommasi, T.: Translate to adapt: RGB-d scene recognition across domains. arXiv preprint arXiv:2103.14672 (2021)
Du, D., Wang, L., Wang, H., Zhao, K., Wu, G.: Translate-to-recognize networks for RGB-d scene recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11836–11845 (2019)
Ayub, A., Wagner, A.R.: Centroid based concept learning for RGB-d indoor scene classification. arXiv preprint arXiv:1911.00155 (2019)
Peng, K.C., Wu, Z., Ernst, J.: Zero-shot deep domain adaptation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 764–781 (2018)
Hsu, H.K., et al.: Progressive domain adaptation for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 749–757 (2020)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)
Paul, S., Tsai, Y.-H., Schulter, S., Roy-Chowdhury, A.K., Chandraker, M.: Domain adaptive semantic segmentation using weak labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 571–587. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_33
Hoffman, J., et al.: Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, PMLR, pp. 1989–1998 (2018)
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1406–1415 (2019)
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Generalized source-free domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8978–8987 (2021)
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Exploiting the intrinsic neighborhood structure for source-free domain adaptation. arXiv preprint arXiv:2110.04202 (2021)
Yang, S., Wang, Y., van de Weijer, J., Herranz, L., Jui, S.: Casting a bait for offline and online source-free domain adaptation. arXiv preprint arXiv:2010.12427 (2020)
Agarwal, P., Paudel, D.P., Zaech, J.N., Van Gool, L.: Unsupervised robust domain adaptation without source data. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2009–2018 (2022)
Liang, J., Hu, D., Wang, Y., He, R., Feng, J.: Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 8602–8617 (2021)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
Bridle, J.S., Heading, A.J., MacKay, D.J.: Unsupervised classifiers, mutual information and phantom targets (1992)
Kutbi, M., Peng, K.C., Wu, Z.: Zero-shot deep domain adaptation with common representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3909–3924 (2021)
Ioffe, S., Normalization, C.S.B.: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Yin, H., et al.: Dreaming to distill: data-free knowledge transfer via deepinversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724 (2020)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Cho, J., Min, D., Kim, Y., Sohn, K.: Deep monocular depth estimation leveraging a large-scale outdoor stereo dataset. Expert Syst. Appl. 178, 114877 (2021)
Brown, M., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR 2011, pp. 177–184. IEEE (2011)
Kim, Y., Ham, B., Oh, C., Sohn, K.: Structure selective depth superresolution for RGB-D cameras. IEEE Trans. Image Process. 25(11), 5227–5238 (2016)
Kim, S., Min, D., Ham, B., Kim, S., Sohn, K.: Deep stereo confidence prediction for depth estimation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp.992–996 IEEE (2017)
Kim, Y., Jung, H., Min, D., Sohn, K.: Deep monocular depth estimation via integration of global and local predictions. IEEE Trans. Image Process. 27(8), 4131–4144 (2018)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026–8037 (2019)
Acknowledgements
SMA, SL, KCP and MJ were supported by Mitsubishi Electric Research Laboratories. SMA and ARC were partially supported by ONR grant N00014-19-1-2264 and the NSF grants CCF-2008020 and IIS-1724341.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmed, S.M., Lohit, S., Peng, KC., Jones, M.J., Roy-Chowdhury, A.K. (2022). Cross-Modal Knowledge Transfer Without Task-Relevant Source Data. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13694. Springer, Cham. https://doi.org/10.1007/978-3-031-19830-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-19830-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19829-8
Online ISBN: 978-3-031-19830-4
eBook Packages: Computer ScienceComputer Science (R0)