Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3664647.3681587acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Free access

Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning

Published: 28 October 2024 Publication History

Abstract

For solving the limitations of the current self knowledge distillation including never fully utilizing the knowledge of shallow exits and neglecting the impact of auxiliary exits' structure on the performance of network, a novel self knowledge distillation framework via virtual teacher-students mutual learning named LOTH is proposed in this paper. A knowledgeable virtual teacher is constructed from the rich feature maps of each exit to help the learning of each exit. Meanwhile, the logit knowledges of each exit are incorporated to guide the learning of the virtual teacher. They learn mutually through the well-designed loss in LOTH. Moreover, two kinds of auxiliary building blocks are designed to balance the efficiency and effectiveness of network. Extensive experiments with diverse backbones on CIFAR-100 and Tiny-ImageNet validate the effectiveness of LOTH, which realizes superior performance with less resource by the comparison with the state-of-the-art distillation methods. The code of LOTH is available on Github https://github.com/cloak-s/LOTH.

References

[1]
Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng, and Chun Chen. 2020. Online knowledge distillation with diverse peers. In Proceedings of the AAAI, Vol. 34. 3430--3437.
[2]
Linrui Gong, Shaohui Lin, Baochang Zhang, Yunhang Shen, Ke Li, Ruizhi Qiao, Bo Ren, Muqing Li, Zhou Yu, and Lizhuang Ma. 2023. Adaptive hierarchy-branch fusion for online knowledge distillation. In Proceedings of the AAAI, Vol. 37. 7731--7739.
[3]
Qiushan Guo, Xinjiang Wang, Yichao Wu, Zhipeng Yu, Ding Liang, Xiaolin Hu, and Ping Luo. 2020. Online knowledge distillation via collaborative learning. In Proceedings of the CVPR. 11020--11029.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR. 770--778.
[5]
Yang He and Lingao Xiao. 2023. Structured pruning for deep convolutional neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[6]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[7]
Saihui Hou, Xu Liu, and Zilei Wang. 2017. Dualnet: Learn complementary features for image recognition. In Proceedings of the ICCV. 502--510.
[8]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[9]
Pei Huang, Haoze Wu, Yuting Yang, Ieva Daukantas, Min Wu, Yedi Zhang, and Clark Barrett. 2024. Towards Efficient Verification of Quantized Neural Networks. In Proceedings of the AAAI, Vol. 38. 21152--21160.
[10]
Tao Huang, Shan You, Fei Wang, Chen Qian, and Chang Xu. 2022. Knowledge distillation from a stronger teacher. Advances in Neural Information Processing Systems, Vol. 35 (2022), 33716--33727.
[11]
Minsoo Kang, Jonghwan Mun, and Bohyung Han. 2020. Towards oracle knowledge distillation with neural architecture search. In Proceedings of the AAAI, Vol. 34. 4404--4411.
[12]
Nikos Komodakis and Sergey Zagoruyko. 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In Proceedings of ICLR.
[13]
Hojung Lee and Jong-Seok Lee. 2022. Rethinking Online Knowledge Distillation with Multi-exits. In Proceedings of the ACCV. 2289--2305.
[14]
Chuanxiu Li, Guangli Li, Hongbin Zhang, and Donghong Ji. 2023. Embedded mutual learning: A novel online distillation method integrating diverse knowledge sources. Applied Intelligence, Vol. 53, 10 (2023), 11524--11537.
[15]
Linfeng Li, Weixing Su, Fang Liu, Maowei He, and Xiaodan Liang. 2023. Knowledge fusion distillation: Improving distillation with multi-scale attention mechanisms. Neural Processing Letters, Vol. 55, 5 (2023), 6165--6180.
[16]
Zheng Li, Ying Huang, Defang Chen, Tianren Luo, Ning Cai, and Zhigeng Pan. 2020. Online knowledge distillation via multi-branch diversity enhancement. In Proceedings of the ACCV.
[17]
Zheng Li, Xiang Li, Lingfeng Yang, Renjie Song, Jian Yang, and Zhigeng Pan. 2024. Dual teachers for self-knowledge distillation. Pattern Recognition (2024), 110422.
[18]
Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, Renjie Song, Lei Luo, Jun Li, and Jian Yang. 2023. Curriculum temperature for knowledge distillation. In Proceedings of the AAAI, Vol. 37. 1504--1512.
[19]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of ECCV. 116--131.
[20]
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the CVPR. 3967--3976.
[21]
Jun Rao, Xv Meng, Liang Ding, Shuhan Qi, Xuebo Liu, Min Zhang, and Dacheng Tao. 2023. Parameter-efficient and student-friendly knowledge distillation. IEEE Transactions on Multimedia (2023).
[22]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the CVPR. 4510--4520.
[23]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[24]
Tongtong Su, Qiyu Liang, Jinsong Zhang, Zhaoyang Yu, Ziyue Xu, Gang Wang, and Xiaoguang Liu. 2023. Deep Cross-Layer Collaborative Learning Network for Online Knowledge Distillation. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33, 5 (2023), 2075--2087.
[25]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the CVPR. 1--9.
[26]
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the CVPR. 11534--11542.
[27]
Chuanguang Yang, Zhulin An, Helong Zhou, Fuzhen Zhuang, Yongjun Xu, and Qian Zhang. 2023. Online knowledge distillation via mutual contrastive learning for visual recognition. IEEE TPAMI (2023).
[28]
Linfeng Zhang, Chenglong Bao, and Kaisheng Ma. 2021. Self-distillation: Towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 8 (2021), 4388--4403.
[29]
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the ICCV. 3713--3722.
[30]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the ICCV. 6848--6856.
[31]
Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In Proceedings of the CVPR. 4320--4328.
[32]
Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled knowledge distillation. In Proceedings of the CVPR. 11953--11962.
[33]
Xiatian Zhu, Shaogang Gong, et al. 2018. Knowledge distillation by on-the-fly native ensemble. Advances in neural information processing systems, Vol. 31 (2018).
[34]
Xuan Zhu, Wangshu Yao, and Kang Song. 2022. Online knowledge distillation with multi-architecture peers. In Proceedings of IJCNN. IEEE, 1--8.

Index Terms

  1. Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
      October 2024
      11719 pages
      ISBN:9798400706868
      DOI:10.1145/3664647
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. feature fusion
      2. knowledge distillation
      3. multi-exits
      4. self-distillation

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      MM '24
      Sponsor:
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne VIC, Australia

      Acceptance Rates

      MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 36
        Total Downloads
      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)36
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media