Abstract
Distillation-aware Architecture Search (DAS) seeks to discover the ideal student architecture that delivers superior performance by distilling knowledge from a given teacher model. Previous DAS methods involve time-consuming training-based search processes. Recently, the training-free DAS method (i.e., DisWOT) proposes KD-based proxies and achieves significant search acceleration. However, we observe that DisWOT suffers from limitations such as the need for manual design and poor generalization to diverse architectures, such as the Vision Transformer (ViT). To address these issues, we present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free DAS. Specifically, we empirically find that proxies conditioned on student instinct statistics and teacher-student interaction statistics can effectively predict distillation accuracy. Then, we represent the proxy with computation graphs and construct the proxy search space using instinct and interaction statistics as inputs. To identify promising proxies, our search space incorporates various types of basic transformations and network distance operators inspired by previous proxy and KD-loss designs. Next, our EA initializes populations, evaluates, performs crossover and mutation operations, and selects the best correlation candidate with distillation accuracy. We introduce an adaptive-elite selection strategy to enhance search efficiency and strive for a balance between exploitation and exploration. Finally, we conduct training-free DAS with discovered proxy before the optimal student distillation phase. In this way, our auto-discovery framework eliminates the need for manual design and tuning, while also adapting to different search spaces through direct correlation optimization. Extensive experiments demonstrate that Auto-DAS generalizes well to various architectures and search spaces (e.g., ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy. Code at: https://github.com/lliai/Auto-DAS.
L. Li—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdi, H.: The kendall rank correlation coefficient. Encyclopedia Meas. Stat. 2, 508–510 (2007)
Akhauri, Y., Munoz, J.P., Jain, N., Iyer, R.: EZNAS: evolving zero-cost proxies for neural architecture scoring. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) NeurIPS (2022). https://openreview.net/forum?id=lSqaDG4dvdt
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: ICLR (2017)
Bowley, A.: The standard deviation of the correlation coefficient. J. Am. Stat. Assoc. 23(161), 31–34 (1928)
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 (2020)
Dong, P., et al.: Pruner-zero: evolving symbolic pruning metric from scratch for large language models. In: ICML (2024)
Dong, P., Li, L., Wei, Z.: Diswot: student architecture search for distillation without training. In: CVPR (2023)
Dong, P., Li, L., Wei, Z., Niu, X., Tian, Z., Pan, H.: EMQ: evolving training-free proxies for automated mixed precision quantization. In: ICCV, pp. 17076–17086 (2023)
Dong, P., et al.: Prior-guided one-shot neural architecture search. arXiv preprint arXiv:2206.13329 (2022)
Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: CVPR (2019)
Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. In: ICLR (2019)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)
Falkner, S., Klein, A., Hutter, F.: BOHB: Robust and efficient hyperparameter optimization at scale. In: ICML (2018)
Gu, J., Tresp, V.: Search for better students to learn distilled knowledge. arXiv preprint arXiv:2001.11612 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Hu, S., et al.: DSNAS: direct neural architecture search without parameter retraining. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Hu, Y., et al.: Angle-based search space shrinking for neural architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 119–134. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_8
Hu, Y., Wang, X., Li, L., Gu, Q.: Improving one-shot NAS with shrinking-and-expanding supernet. Pattern Recogn. (2021)
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP (2016)
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Lee, N., Ajanthan, T., Torr, P.: Snip: single-shot network pruning based on connection sensitivity. In: ICLR (2018)
Li, K., Yu, R., Wang, Z., Yuan, L., Song, G., Chen, J.: Locality guidance for improving vision transformers on tiny datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13684, pp. 110–127. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_7
Li, L., Talwalkar, A.S.: Random search and reproducibility for neural architecture search. arXiv (2019)
Li, L.: Self-regulated feature learning via teacher-free feature distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13686, pp. 347–363. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_20
Li, L., et al.: Detkds: knowledge distillation search for object detectors. In: ICML (2024)
Li, L., Dong, P., Li, A., Wei, Z., Yang, Y.: KD-zero: evolving knowledge distiller for any teacher-student pairs. In: NeuIPS (2024)
Li, L., Dong, P., Wei, Z., Yang, Y.: Automated knowledge distillation via Monte Carlo tree search. In: ICCV (2023)
Li, L., Jin, Z.: Shadow knowledge distillation: bridging offline and online knowledge transfer. In: NeuIPS (2022)
Li, L., et al.: Auto-GAS: automated proxy discovery for training-free generative architecture search. In: ECCV (2024)
Li, L., Wang, Y., Yao, A., Qian, Y., Zhou, X., He, K.: Explicit connection distillation. In: ICLR (2020)
Li, L., et al.: Attnzero: efficient attention discovery for vision transformers. In: ECCV (2024)
Lin, M., et al.: Zen-NAS: a zero-shot NAS for high-performance image recognition. In: ICCV (2021)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th ICLR, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019, abs/1806.09055 (2019)
Liu, Y., et al.: Search to distill: pearls are everywhere but not the eyes. In: CVPR (2020)
Mellor, J., Turner, J., Storkey, A., Crowley, E.J.: Neural architecture search without training. In: ICML (2021)
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI (2020)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)
Park, W., Lu, Y., Cho, M., Kim, D.: Relational knowledge distillation. In: CVPR (2019)
Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018)
Real, E., Liang, C., So, D.R., Le, Q.V.: Automl-zero: evolving machine learning algorithms from scratch (2020)
Shao, S., Dai, X., Yin, S., Li, L., Chen, H., Hu, Y.: Catch-up distillation: you only need to train once for accelerating sampling. arXiv preprint arXiv:2305.10769 (2023)
Stephanou, M., Varughese, M.: Sequential estimation of spearman rank correlation using Hermite series estimators. J. Multivar. Anal. 186, 104783 (2021)
Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020)
Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv abs/1801.05787 (2018)
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)
Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376 (2020)
Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)
Xiaolong, L., Lujun, L., Chao, L., Yao, A.: Norm: knowledge distillation via n-to-one representation matching (2022)
Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: NAS-bench-101: towards reproducible neural architecture search. In: ICML (2019)
You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: Greedynas: towards fast one-shot NAS with greedy supernet. In: CVPR (2020)
Zhang, L., Ma, K.: Improve object detection with feature-based knowledge distillation: towards accurate and efficient detectors. In: ICLR (2020)
Zhou, H., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective (2021)
Zhu, C., Li, L., Wu, Y., Sun, Z.: Saswot: real-time semantic segmentation architecture search without training. In: AAAI (2024)
Zhu, C., Chen, W., Peng, T., Wang, Y., Jin, M.: Hard sample aware noise robust learning for histopathology image classification. TMI (2021)
Zimian Wei, Z., et al.: Auto-prox: training-free vision transformer architecture search via automatic proxy discovery. In: AAAI (2024)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, H., Li, L., Dong, P., Wei, Z., Shao, S. (2025). Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15063. Springer, Cham. https://doi.org/10.1007/978-3-031-72652-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-72652-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72651-4
Online ISBN: 978-3-031-72652-1
eBook Packages: Computer ScienceComputer Science (R0)