Nothing Special   »   [go: up one dir, main page]

Skip to main content

Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15063))

Included in the following conference series:

  • 225 Accesses

Abstract

Distillation-aware Architecture Search (DAS) seeks to discover the ideal student architecture that delivers superior performance by distilling knowledge from a given teacher model. Previous DAS methods involve time-consuming training-based search processes. Recently, the training-free DAS method (i.e., DisWOT) proposes KD-based proxies and achieves significant search acceleration. However, we observe that DisWOT suffers from limitations such as the need for manual design and poor generalization to diverse architectures, such as the Vision Transformer (ViT). To address these issues, we present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free DAS. Specifically, we empirically find that proxies conditioned on student instinct statistics and teacher-student interaction statistics can effectively predict distillation accuracy. Then, we represent the proxy with computation graphs and construct the proxy search space using instinct and interaction statistics as inputs. To identify promising proxies, our search space incorporates various types of basic transformations and network distance operators inspired by previous proxy and KD-loss designs. Next, our EA initializes populations, evaluates, performs crossover and mutation operations, and selects the best correlation candidate with distillation accuracy. We introduce an adaptive-elite selection strategy to enhance search efficiency and strive for a balance between exploitation and exploration. Finally, we conduct training-free DAS with discovered proxy before the optimal student distillation phase. In this way, our auto-discovery framework eliminates the need for manual design and tuning, while also adapting to different search spaces through direct correlation optimization. Extensive experiments demonstrate that Auto-DAS generalizes well to various architectures and search spaces (e.g., ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy. Code at: https://github.com/lliai/Auto-DAS.

L. Li—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdi, H.: The kendall rank correlation coefficient. Encyclopedia Meas. Stat. 2, 508–510 (2007)

    Google Scholar 

  2. Akhauri, Y., Munoz, J.P., Jain, N., Iyer, R.: EZNAS: evolving zero-cost proxies for neural architecture scoring. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) NeurIPS (2022). https://openreview.net/forum?id=lSqaDG4dvdt

  3. Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: ICLR (2017)

    Google Scholar 

  4. Bowley, A.: The standard deviation of the correlation coefficient. J. Am. Stat. Assoc. 23(161), 31–34 (1928)

    Article  Google Scholar 

  5. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 (2020)

  6. Dong, P., et al.: Pruner-zero: evolving symbolic pruning metric from scratch for large language models. In: ICML (2024)

    Google Scholar 

  7. Dong, P., Li, L., Wei, Z.: Diswot: student architecture search for distillation without training. In: CVPR (2023)

    Google Scholar 

  8. Dong, P., Li, L., Wei, Z., Niu, X., Tian, Z., Pan, H.: EMQ: evolving training-free proxies for automated mixed precision quantization. In: ICCV, pp. 17076–17086 (2023)

    Google Scholar 

  9. Dong, P., et al.: Prior-guided one-shot neural architecture search. arXiv preprint arXiv:2206.13329 (2022)

  10. Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: CVPR (2019)

    Google Scholar 

  11. Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. In: ICLR (2019)

    Google Scholar 

  12. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)

    Google Scholar 

  13. Falkner, S., Klein, A., Hutter, F.: BOHB: Robust and efficient hyperparameter optimization at scale. In: ICML (2018)

    Google Scholar 

  14. Gu, J., Tresp, V.: Search for better students to learn distilled knowledge. arXiv preprint arXiv:2001.11612 (2020)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  16. Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: ICCV (2021)

    Google Scholar 

  17. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  18. Hu, S., et al.: DSNAS: direct neural architecture search without parameter retraining. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  19. Hu, Y., et al.: Angle-based search space shrinking for neural architecture search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 119–134. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_8

    Chapter  Google Scholar 

  20. Hu, Y., Wang, X., Li, L., Gu, Q.: Improving one-shot NAS with shrinking-and-expanding supernet. Pattern Recogn. (2021)

    Google Scholar 

  21. Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. In: EMNLP (2016)

    Google Scholar 

  22. Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  23. Lee, N., Ajanthan, T., Torr, P.: Snip: single-shot network pruning based on connection sensitivity. In: ICLR (2018)

    Google Scholar 

  24. Li, K., Yu, R., Wang, Z., Yuan, L., Song, G., Chen, J.: Locality guidance for improving vision transformers on tiny datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13684, pp. 110–127. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20053-3_7

    Chapter  Google Scholar 

  25. Li, L., Talwalkar, A.S.: Random search and reproducibility for neural architecture search. arXiv (2019)

    Google Scholar 

  26. Li, L.: Self-regulated feature learning via teacher-free feature distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13686, pp. 347–363. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19809-0_20

    Chapter  Google Scholar 

  27. Li, L., et al.: Detkds: knowledge distillation search for object detectors. In: ICML (2024)

    Google Scholar 

  28. Li, L., Dong, P., Li, A., Wei, Z., Yang, Y.: KD-zero: evolving knowledge distiller for any teacher-student pairs. In: NeuIPS (2024)

    Google Scholar 

  29. Li, L., Dong, P., Wei, Z., Yang, Y.: Automated knowledge distillation via Monte Carlo tree search. In: ICCV (2023)

    Google Scholar 

  30. Li, L., Jin, Z.: Shadow knowledge distillation: bridging offline and online knowledge transfer. In: NeuIPS (2022)

    Google Scholar 

  31. Li, L., et al.: Auto-GAS: automated proxy discovery for training-free generative architecture search. In: ECCV (2024)

    Google Scholar 

  32. Li, L., Wang, Y., Yao, A., Qian, Y., Zhou, X., He, K.: Explicit connection distillation. In: ICLR (2020)

    Google Scholar 

  33. Li, L., et al.: Attnzero: efficient attention discovery for vision transformers. In: ECCV (2024)

    Google Scholar 

  34. Lin, M., et al.: Zen-NAS: a zero-shot NAS for high-performance image recognition. In: ICCV (2021)

    Google Scholar 

  35. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: 7th ICLR, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019, abs/1806.09055 (2019)

    Google Scholar 

  36. Liu, Y., et al.: Search to distill: pearls are everywhere but not the eyes. In: CVPR (2020)

    Google Scholar 

  37. Mellor, J., Turner, J., Storkey, A., Crowley, E.J.: Neural architecture search without training. In: ICML (2021)

    Google Scholar 

  38. Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI (2020)

    Google Scholar 

  39. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)

    Google Scholar 

  40. Park, W., Lu, Y., Cho, M., Kim, D.: Relational knowledge distillation. In: CVPR (2019)

    Google Scholar 

  41. Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J.: Efficient neural architecture search via parameter sharing. In: ICML, pp. 4092–4101 (2018)

    Google Scholar 

  42. Real, E., Liang, C., So, D.R., Le, Q.V.: Automl-zero: evolving machine learning algorithms from scratch (2020)

    Google Scholar 

  43. Shao, S., Dai, X., Yin, S., Li, L., Chen, H., Hu, Y.: Catch-up distillation: you only need to train once for accelerating sampling. arXiv preprint arXiv:2305.10769 (2023)

  44. Stephanou, M., Varughese, M.: Sequential estimation of spearman rank correlation using Hermite series estimators. J. Multivar. Anal. 186, 104783 (2021)

    Article  MathSciNet  Google Scholar 

  45. Tanaka, H., Kunin, D., Yamins, D.L., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020)

    Google Scholar 

  46. Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv abs/1801.05787 (2018)

    Google Scholar 

  47. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)

    Google Scholar 

  48. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: ICML (2021)

    Google Scholar 

  49. Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376 (2020)

  50. Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)

    Google Scholar 

  51. Xiaolong, L., Lujun, L., Chao, L., Yao, A.: Norm: knowledge distillation via n-to-one representation matching (2022)

    Google Scholar 

  52. Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: NAS-bench-101: towards reproducible neural architecture search. In: ICML (2019)

    Google Scholar 

  53. You, S., Huang, T., Yang, M., Wang, F., Qian, C., Zhang, C.: Greedynas: towards fast one-shot NAS with greedy supernet. In: CVPR (2020)

    Google Scholar 

  54. Zhang, L., Ma, K.: Improve object detection with feature-based knowledge distillation: towards accurate and efficient detectors. In: ICLR (2020)

    Google Scholar 

  55. Zhou, H., et al.: Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective (2021)

    Google Scholar 

  56. Zhu, C., Li, L., Wu, Y., Sun, Z.: Saswot: real-time semantic segmentation architecture search without training. In: AAAI (2024)

    Google Scholar 

  57. Zhu, C., Chen, W., Peng, T., Wang, Y., Jin, M.: Hard sample aware noise robust learning for histopathology image classification. TMI (2021)

    Google Scholar 

  58. Zimian Wei, Z., et al.: Auto-prox: training-free vision transformer architecture search via automatic proxy discovery. In: AAAI (2024)

    Google Scholar 

  59. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lujun Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 380 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, H., Li, L., Dong, P., Wei, Z., Shao, S. (2025). Auto-DAS: Automated Proxy Discovery for Training-Free Distillation-Aware Architecture Search. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15063. Springer, Cham. https://doi.org/10.1007/978-3-031-72652-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72652-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72651-4

  • Online ISBN: 978-3-031-72652-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics