Dy-KD: Dynamic Knowledge Distillation for Reduced Easy Examples

Cheng Lin^10,12,
Ning Jiang^10,12,
Jialiang Tang¹¹,
Xinlei Huang^10,12 &
…
Wenqing Wu¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1966))

Included in the following conference series:

International Conference on Neural Information Processing

747 Accesses
2 Citations

Abstract

Knowledge distillation is usually performed by promoting a small model (student) to mimic the knowledge of a large model (teacher). The current knowledge distillation methods mainly focus on the extraction and transformation of knowledge while ignoring the importance of examples in the dataset and assigning equal weight to each example. Therefore, in this paper, we propose Dynamic Knowledge Distillation (Dy-KD). To alleviate this problem, Dy-KD incorporates a curriculum strategy to selectively discard easy examples during knowledge distillation. Specifically, we estimate the difficulty level of examples by the predictions from the superior teacher network and divide examples in a dataset into easy examples and hard examples. Subsequently, these examples are given various weights to adjust their contributions to the knowledge transfer. We validate our Dy-KD on CIFAR-100 and Tiny-ImageNet; the experimental results show that: (1) Use the curriculum strategy to discard easy examples to prevent the model’s fitting ability from being consumed by fitting easy examples. (2) Giving hard and easy examples varied weight so that the model emphasizes learning hard examples, which can boost students’ performance. At the same time, our method is easy to build on the existing distillation method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

What Role Does Data Augmentation Play in Knowledge Distillation?

SCL-IKD: intermediate knowledge distillation via supervised contrastive representation learning

Article 06 October 2023

References

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Google Scholar
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)
Google Scholar
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)
Ghosh, S., Srinivasa, S.K., Amon, P., Hutter, A., Kaup, A.: Deep network pruning for object detection. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3915–3919. IEEE (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hui, T.W., Tang, X., Loy, C.C.: Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)
Google Scholar
Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. In: Advances in Neural Information Processing Systems, vol. 2 (1989)
Google Scholar
Li, C., et al.: Knowledge condensation distillation. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13671, pp. 19–35. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20083-0_2
Chapter Google Scholar
Li, J., Zhou, S., Li, L., Wang, H., Bu, J., Yu, Z.: Dynamic data-free knowledge distillation by easy-to-hard learning strategy. Inf. Sci. 642, 119202 (2023)
Article Google Scholar
Li, L., Jin, Z.: Shadow knowledge distillation: bridging offline and online knowledge transfer. Adv. Neural. Inf. Process. Syst. 35, 635–649 (2022)
Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Passalis, N., Tefas, A.: Probabilistic knowledge transfer for deep representation learning. CoRR, abs/1803.10837 1(2), 5 (2018)
Google Scholar
Pintea, S.L., Liu, Y., van Gemert, J.C.: Recurrent knowledge distillation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3393–3397. IEEE (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Supancic, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2379–2386 (2013)
Google Scholar
Tang, J., Liu, M., Jiang, N., Cai, H., Yu, W., Zhou, J.: Data-free network pruning for model compression. In: 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. IEEE (2021)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Wu, X., Dyer, E., Neyshabur, B.: When do curricula work? arXiv preprint arXiv:2012.03107 (2020)
Yang, J., Martinez, B., Bulat, A., Tzimiropoulos, G.: Knowledge distillation via softmax regression representation learning. In: International Conference on Learning Representations (2020)
Google Scholar
Yang, J., Martinez, B., Bulat, A., Tzimiropoulos, G., et al.: Knowledge distillation via softmax regression representation learning. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Zhao, B., Cui, Q., Song, R., Qiu, Y., Liang, J.: Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11953–11962 (2022)
Google Scholar
Zhao, H., Sun, X., Dong, J., Dong, Z., Li, Q.: Knowledge distillation via instance-level sequence learning. Knowl.-Based Syst. 233, 107519 (2021)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar

Download references

Acknowledgement

This research is supported by Sichuan Science and Technology Program (No. 2022YFG0324), SWUST Doctoral Research Foundation under Grant 19zx7102.

Author information

Authors and Affiliations

School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, 621000, Sichuan, China
Cheng Lin, Ning Jiang, Xinlei Huang & Wenqing Wu
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China
Jialiang Tang
Jiangxi Qiushi Academy for Advanced Studies, Nanchang, 330036, Jiangxi, China
Cheng Lin, Ning Jiang & Xinlei Huang

Authors

Cheng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ning Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jialiang Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xinlei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenqing Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Jiang .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, C., Jiang, N., Tang, J., Huang, X., Wu, W. (2024). Dy-KD: Dynamic Knowledge Distillation for Reduced Easy Examples. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-8148-9_18
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dy-KD: Dynamic Knowledge Distillation for Reduced Easy Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

What Role Does Data Augmentation Play in Knowledge Distillation?

SCL-IKD: intermediate knowledge distillation via supervised contrastive representation learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Dy-KD: Dynamic Knowledge Distillation for Reduced Easy Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The Phenomenon of Resonance in Knowledge Distillation: Learning Students by Non-strong Teachers

What Role Does Data Augmentation Play in Knowledge Distillation?

SCL-IKD: intermediate knowledge distillation via supervised contrastive representation learning

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation