Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search

Haosen Sun¹³,
Lujun Li¹³,
Peijie Dong¹⁴,
Zimian Wei¹⁵ &
…
Shitong Shao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15063))

Included in the following conference series:

European Conference on Computer Vision

225 Accesses

Abstract

Distillation-aware Architecture Search (DAS) seeks to discover the ideal student architecture that delivers superior performance by distilling knowledge from a given teacher model. Previous DAS methods involve time-consuming training-based search processes. Recently, the training-free DAS method (i.e., DisWOT) proposes KD-based proxies and achieves significant search acceleration. However, we observe that DisWOT suffers from limitations such as the need for manual design and poor generalization to diverse architectures, such as the Vision Transformer (ViT). To address these issues, we present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free DAS. Specifically, we empirically find that proxies conditioned on student instinct statistics and teacher-student interaction statistics can effectively predict distillation accuracy. Then, we represent the proxy with computation graphs and construct the proxy search space using instinct and interaction statistics as inputs. To identify promising proxies, our search space incorporates various types of basic transformations and network distance operators inspired by previous proxy and KD-loss designs. Next, our EA initializes populations, evaluates, performs crossover and mutation operations, and selects the best correlation candidate with distillation accuracy. We introduce an adaptive-elite selection strategy to enhance search efficiency and strive for a balance between exploitation and exploration. Finally, we conduct training-free DAS with discovered proxy before the optimal student distillation phase. In this way, our auto-discovery framework eliminates the need for manual design and tuning, while also adapting to different search spaces through direct correlation optimization. Extensive experiments demonstrate that Auto-DAS generalizes well to various architectures and search spaces (e.g., ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy. Code at: https://github.com/lliai/Auto-DAS.

L. Li—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

New Indicators and Optimizations for Zero-Shot NAS Based on Feature Maps

Pareto-Informed Multi-objective Neural Architecture Search

References

Abdi, H.: The kendall rank correlation coefficient. Encyclopedia Meas. Stat. 2, 508–510 (2007)
Google Scholar
Akhauri, Y., Munoz, J.P., Jain, N., Iyer, R.: EZNAS: evolving zero-cost proxies for neural architecture scoring. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) NeurIPS (2022). https://openreview.net/forum?id=lSqaDG4dvdt
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: ICLR (2017)
Google Scholar
Bowley, A.: The standard deviation of the correlation coefficient. J. Am. Stat. Assoc. 23(161), 31–34 (1928)
Article Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 (2020)
Dong, P., et al.: Pruner-zero: evolving symbolic pruning metric from scratch for large language models. In: ICML (2024)
Google Scholar
Dong, P., Li, L., Wei, Z.: Diswot: student architecture search for distillation without training. In: CVPR (2023)
Google Scholar
Dong, P., Li, L., Wei, Z., Niu, X., Tian, Z., Pan, H.: EMQ: evolving training-free proxies for automated mixed precision quantization. In: ICCV, pp. 17076–17086 (2023)
Google Scholar
Dong, P., et al.: Prior-guided one-shot neural architecture search. arXiv preprint arXiv:2206.13329 (2022)
Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: CVPR (2019)
Google Scholar
Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. In: ICLR (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)
Google Scholar
Falkner, S., Klein, A., Hutter, F.: BOHB: Robust and efficient hyperparameter optimization at scale. In: ICML (2018)
Google Scholar
Gu, J., Tresp, V.: Search for better students to learn distilled knowledge. arXiv preprint arXiv:2001.11612 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, S., et al.: DSNAS: direct neural architecture search without parameter retraining. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Hu, Y., et al.: Angle-based search space shrinking for neural architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 119–134. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_8
Chapter Google Scholar
Hu, Y., Wang, X., Li, L., Gu, Q.: Improving one-shot NAS with shrinking-and-expanding supernet. Pattern Recogn. (2021)
Google Scholar
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP (2016)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lee, N., Ajanthan, T., Torr, P.: Snip: single-shot network pruning based on connection sensitivity. In: ICLR (2018)
Google Scholar
Li, K., Yu, R., Wang, Z., Yuan, L., Song, G., Chen, J.: Locality guidance for improving vision transformers on tiny datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13684, pp. 110–127. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_7
Chapter Google Scholar
Li, L., Talwalkar, A.S.: Random search and reproducibility for neural architecture search. arXiv (2019)
Google Scholar
Li, L.: Self-regulated feature learning via teacher-free feature distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13686, pp. 347–363. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_20
Chapter Google Scholar
Li, L., et al.: Detkds: knowledge distillation search for object detectors. In: ICML (2024)
Google Scholar
Li, L., Dong, P., Li, A., Wei, Z., Yang, Y.: KD-zero: evolving knowledge distiller for any teacher-student pairs. In: NeuIPS (2024)
Google Scholar
Li, L., Dong, P., Wei, Z., Yang, Y.: Automated knowledge distillation via Monte Carlo tree search. In: ICCV (2023)
Google Scholar
Li, L., Jin, Z.: Shadow knowledge distillation: bridging offline and online knowledge transfer. In: NeuIPS (2022)
Google Scholar
Li, L., et al.: Auto-GAS: automated proxy discovery for training-free generative architecture search. In: ECCV (2024)
Google Scholar
Li, L., Wang, Y., Yao, A., Qian, Y., Zhou, X., He, K.: Explicit connection distillation. In: ICLR (2020)
Google Scholar
Li, L., et al.: Attnzero: efficient attention discovery for vision transformers. In: ECCV (2024)
Google Scholar
Lin, M., et al.: Zen-NAS: a zero-shot NAS for high-performance image recognition. In: ICCV (2021)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th ICLR, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019, abs/1806.09055 (2019)
Google Scholar
Liu, Y., et al.: Search to distill: pearls are everywhere but not the eyes. In: CVPR (2020)
Google Scholar
Mellor, J., Turner, J., Storkey, A., Crowley, E.J.: Neural architecture search without training. In: ICML (2021)
Google Scholar
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI (2020)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Park, W., Lu, Y., Cho, M., Kim, D.: Relational knowledge distillation. In: CVPR (2019)
Google Scholar
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018)
Google Scholar
Real, E., Liang, C., So, D.R., Le, Q.V.: Automl-zero: evolving machine learning algorithms from scratch (2020)
Google Scholar
Shao, S., Dai, X., Yin, S., Li, L., Chen, H., Hu, Y.: Catch-up distillation: you only need to train once for accelerating sampling. arXiv preprint arXiv:2305.10769 (2023)
Stephanou, M., Varughese, M.: Sequential estimation of spearman rank correlation using Hermite series estimators. J. Multivar. Anal. 186, 104783 (2021)
Article MathSciNet Google Scholar
Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020)
Google Scholar
Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv abs/1801.05787 (2018)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)
Google Scholar
Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376 (2020)
Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)
Google Scholar
Xiaolong, L., Lujun, L., Chao, L., Yao, A.: Norm: knowledge distillation via n-to-one representation matching (2022)
Google Scholar
Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: NAS-bench-101: towards reproducible neural architecture search. In: ICML (2019)
Google Scholar
You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: Greedynas: towards fast one-shot NAS with greedy supernet. In: CVPR (2020)
Google Scholar
Zhang, L., Ma, K.: Improve object detection with feature-based knowledge distillation: towards accurate and efficient detectors. In: ICLR (2020)
Google Scholar
Zhou, H., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective (2021)
Google Scholar
Zhu, C., Li, L., Wu, Y., Sun, Z.: Saswot: real-time semantic segmentation architecture search without training. In: AAAI (2024)
Google Scholar
Zhu, C., Chen, W., Peng, T., Wang, Y., Jin, M.: Hard sample aware noise robust learning for histopathology image classification. TMI (2021)
Google Scholar
Zimian Wei, Z., et al.: Auto-prox: training-free vision transformer architecture search via automatic proxy discovery. In: AAAI (2024)
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Hong Kong, China
Haosen Sun & Lujun Li
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Peijie Dong
National University of Defense Technology, Changsha, China
Zimian Wei
Southeast University, Dhaka, Bangladesh
Shitong Shao

Authors

Haosen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Lujun Li
View author publications
You can also search for this author in PubMed Google Scholar
Peijie Dong
View author publications
You can also search for this author in PubMed Google Scholar
Zimian Wei
View author publications
You can also search for this author in PubMed Google Scholar
Shitong Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lujun Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 380 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, H., Li, L., Dong, P., Wei, Z., Shao, S. (2025). Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15063. Springer, Cham. https://doi.org/10.1007/978-3-031-72652-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72652-1_4
Published: 30 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72651-4
Online ISBN: 978-3-031-72652-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

New Indicators and Optimizations for Zero-Shot NAS Based on Feature Maps

Pareto-Informed Multi-objective Neural Architecture Search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 380 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search

New Indicators and Optimizations for Zero-Shot NAS Based on Feature Maps

Pareto-Informed Multi-objective Neural Architecture Search

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 380 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation