Knowledge Distillation with Relative Representations for Image Representation Learning

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 766))

Included in the following conference series:

183 Accesses

Abstract

Relative representations allow the alignment of latent spaces which embed data in extrinsically different manners but with similar relative distances between data points. This ability to compare different latent spaces for the same input lends itself to knowledge distillation techniques. We explore the applicability of relative representations to knowledge distillation by training a student model such that the relative representations of its outputs match the relative representations of the outputs of a teacher model. We test our Relative Representation Knowledge Distillation (RRKD) scheme on supervised and self-supervised image representation learning with MNIST and show that an encoder can be compressed to 47.71% of its original size while maintaining 91.92% of its full performance. We demonstrate that RRKD is competitive with or outperforms other relation-based distillation schemes in traditional distillation setups (CIFAR-10, CIFAR-100, SVHN) and in a transfer learning setting (Stanford Cars, Oxford-IIIT Pets, Oxford Flowers-102). Our results indicate that relative representations are an effective signal for knowledge distillation. Code is made available at https://github.com/Ramos-Ramos/rrkd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Knowledge Distillation: A Survey

Article 22 March 2021

Feature Normalized Knowledge Distillation for Image Classification

Knowledge distillation based on projector integration and classifier sharing

Article Open access 20 March 2024

Notes

1.
Some works try to address this by projecting the student outputs with a learnable linear layer to have the same dimensionality as the teacher [8, 20]. Our work is similar in that regard as computing the relative representations can also be considered a linear projection, but our method does not require learning the projection weights.

References

Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Google Scholar
Chen, H., Wang, Y., Xu, C., Xu, C., Tao, D.: Learning student networks via feature embedding. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 25–35 (2020)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, vol. 2, no. 7 (2015)
Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., Liu, Q.: Tinybert: distilling BERT for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Liu, Y., Cao, J., Li, B., Yuan, C., Hu, W., Li, Y., Duan, Y.: Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7096–7104 (2019)
Google Scholar
Moschella, L., Maiorca, V., Fumero, M., Norelli, A., Locatello, F., Rodolà, E.: Relative representations enable zero-shot latent space communication. arXiv preprint arXiv:2209.15430 (2022)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Olah, C.: Visualizing representations: deep learning and human beings (2015). http://colah.github.io/posts/2015-01-Visualizing-Representations/
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3498–3505. IEEE (2012)
Google Scholar
Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 268–284 (2018)
Google Scholar
Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y., Zhou, S., Zhang, Z.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016 (2019)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
You, S., Xu, C., Xu, C., Tao, D.: Learning from multiple teacher networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1285–1294 (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)

Download references

Author information

Authors and Affiliations

ALIVE, Ateneo de Manila University, Quezon City, Philippines
Patrick Ramos, Raphael Alampay & Patricia Abu

Authors

Patrick Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Alampay
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Abu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Ramos .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Robert Burduk
University of Technology and Life Science, Bydgoszcz, Poland
Michał Choraś
University of Technology and Life Science, Bydgoszcz, Poland
Rafał Kozik
Wrocław University of Science and Technology, Wrocław, Poland
Paweł Ksieniewicz
University of Technology and Life Science, Bydgoszcz, Poland
Tomasz Marciniak
Wrocław University of Science and Technology, Wrocław, Poland
Paweł Trajdos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramos, P., Alampay, R., Abu, P. (2023). Knowledge Distillation with Relative Representations for Image Representation Learning. In: Burduk, R., Choraś, M., Kozik, R., Ksieniewicz, P., Marciniak, T., Trajdos, P. (eds) Progress on Pattern Classification, Image Processing and Communications. CORES IP&C 2023 2023. Lecture Notes in Networks and Systems, vol 766. Springer, Cham. https://doi.org/10.1007/978-3-031-41630-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-41630-9_14
Published: 01 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41629-3
Online ISBN: 978-3-031-41630-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Knowledge Distillation with Relative Representations for Image Representation Learning

Abstract

Access this chapter

Subscribe and save

Buy Now