Article

Powering One-Shot Topological NAS with Stabilized Share-Parameter Proxy

Authors:

Junjie YanAuthors Info & Claims

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV

Pages 625 - 641

https://doi.org/10.1007/978-3-030-58568-6_37

Published: 23 August 2020 Publication History

Abstract

One-shot NAS method has attracted much interest from the research community due to its remarkable training efficiency and capacity to discover high performance models. However, the search spaces of previous one-shot based works usually relied on hand-craft design and were short for flexibility on the network topology. In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space (i.e, over

3.4 \times 10^{10}

different topological structures). Specifically, the difficulties for architecture searching in such a complex space has been eliminated by the proposed stabilized share-parameter proxy, which employs Stochastic Gradient Langevin Dynamics to enable fast shared parameter sampling, so as to achieve stabilized measurement of architecture performance even in search space with complex topological structures. The proposed method, namely Stablized Topological Neural Architecture Search (ST-NAS), achieves state-of-the-art performance under Multiply-Adds (MAdds) constraint on ImageNet. Our lite model ST-NAS-A achieves

76.4 %

top-1 accuracy with only 326M MAdds. Our moderate model ST-NAS-B achieves

77.9 %

top-1 accuracy just required 503M MAdds. Both of our models offer superior performances in comparison to other concurrent works on one-shot NAS.

References

[1]

Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167 (2016)

[2]

Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q.: Understanding and simplifying one-shot architecture search. In: International Conference on Machine Learning, pp. 549–558 (2018)

[3]

Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006).

[4]

Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017)

[5]

Cai, H., Zhu, L., Han, S.: Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018)

[6]

Chen, C., Carlson, D., Gan, Z., Li, C., Carin, L.: Bridging the gap between stochastic gradient MCMC and stochastic optimization. In: Artificial Intelligence and Statistics, pp. 1051–1060 (2016)

[7]

Chen, Y., et al.: Reinforced evolutionary neural architecture search. arXiv preprint arXiv:1808.00193 (2018)

[8]

Chu, X., Zhang, B., Xu, R., Li, J.: Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845 (2019)

[9]

Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501 (2018)

[10]

Du, X., et al.: Spinenet: learning scale-permuted backbone for recognition and localization. arXiv preprint arXiv:1912.05027 (2019)

[11]

Elsken, T., Metzen, J.H., Hutter, F.: Simple and efficient architecture search for convolutional neural networks. arXiv preprint arXiv:1711.04528 (2017)

[12]

Fang, M., Wang, Q., Zhong, Z.: Betanas: balanced training and selective drop for neural architecture search. arXiv preprint arXiv:1912.11191 (2019)

[13]

Guo, M., Zhong, Z., Wu, W., Lin, D., Yan, J.: Irlas: inverse reinforcement learning for architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9021–9029 (2019)

[14]

Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420 (2019)

[15]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

[16]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

[17]

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

[18]

Kendall MG A new measure of rank correlation Biometrika 1938 30 1/2 81-93

[19]

Li, C., Yuan, X., Lin, C., Guo, M., Wu, W., Yan, J., Ouyang, W.: AM-LFS: automl for loss function search. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8410–8419 (2019)

[20]

Li, X., et al.: Improving one-shot nas by suppressing the posterior fading. arXiv preprint arXiv:1910.02543 (2019)

[21]

Liang, F., et al.: Computation reallocation for object detection. arXiv preprint arXiv:1912.11234 (2019)

[22]

Lin, C., et al.: Online hyper-parameter learning for auto-augmentation strategy. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6579–6588 (2019)

[23]

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

[24]

Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2018)

[25]

Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)

[26]

Lu, Z., et al.: NSGA-net: a multi-objective genetic algorithm for neural architecture search. arXiv preprint arXiv:1810.03522 (2018)

[27]

Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)

[28]

Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4780–4789 (2019)

[29]

Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2902–2911. JMLR. org (2017)

[30]

Russakovsky O et al. Imagenet large scale visual recognition challenge Int. J. Comput. Vis. 2015 115 3 211-252

Digital Library

[31]

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

[32]

Stamoulis, D., et al.: Single-path nas: designing hardware-efficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877 (2019)

[33]

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence (2017)

[34]

Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

[35]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

[36]

Tan, M., et al.: MNASnet: Platform-aware neural architecture search for mobile. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)

[37]

Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)

[38]

Teh YW, Thiery AH, and Vollmer SJ Consistency and fluctuations for stochastic gradient langevin dynamics J. Mach. Learn. Res. 2016 17 1 193-225

Digital Library

[39]

Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688 (2011)

[40]

Wu, B., et al.: FBNet: hardware-aware efficient convnet design via differentiable neural architecture search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742 (2019)

[41]

Xie, S., Kirillov, A., Girshick, R., He, K.: Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569 (2019)

[42]

Xiong, Y., Mehta, R., Singh, V.: Resource constrained neural network architecture search: will a submodularity assumption help? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1901–1910 (2019)

[43]

Zhang, Y., et al.: Deeper insights into weight sharing in neural architecture search. arXiv preprint arXiv:2001.01431 (2020)

[44]

Zhong, Z., et al.: Blockqnn: efficient block-wise neural network architecture generation. arXiv preprint arXiv:1808.05584 (2018)

[45]

Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)

[46]

Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710 (2018)

Cited By

Xu PZhang LLiu XSun JZhao YYang HYu BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Do not train itProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620026(38826-38847)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620026
Mao TZhang C(2023)IKD-SLU: An Intra-Inter Knowledge Distillation Framework for Zero-Shot Cross-Lingual Spoken Language UnderstandingArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44198-1_29(345-356)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-44198-1_29

Recommendations

Large-scale testing of the Internet's Border Gateway Protocol (BGP) via topological scale-down

The Internet is a critical communication infrastructure servicing billions of end-users world-wide. Ongoing studies of the Internet's operations show that data loss and increased latency are occurring due to weaknesses in its interdomain routing ...
Controllable Cost Search Strategy in Unstructured P2P
CHINAGRID '11: Proceedings of the 2011 Sixth Annual ChinaGrid Conference

The search width and search depth is the key factor of unstructured Peer-to-Peer (P2P) network search algorithms in unstructured Peer-to-Peer (P2P) network. Existing guided search algorithms decreased search width by QoS or history search records, the ...
Generated I-fuzzy topological spaces

In this paper, we extend Lowen functors @w and @i to I-fuzzy topological spaces (or Kubiak-Sostak fuzzy topological spaces) and study their properties. Then we introduce generated I-fuzzy topological spaces and weakly generated I-fuzzy topological ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV

Aug 2020

842 pages

ISBN:978-3-030-58567-9

DOI:10.1007/978-3-030-58568-6

Editors:
Andrea Vedaldi
University of Oxford, Oxford, UK
,
Horst Bischof
Graz University of Technology, Graz, Austria
,
Thomas Brox
University of Freiburg, Freiburg im Breisgau, Germany
,
Jan-Michael Frahm
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

© Springer Nature Switzerland AG 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu PZhang LLiu XSun JZhao YYang HYu BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Do not train itProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620026(38826-38847)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3620026
Mao TZhang C(2023)IKD-SLU: An Intra-Inter Knowledge Distillation Framework for Zero-Shot Cross-Lingual Spoken Language UnderstandingArtificial Neural Networks and Machine Learning – ICANN 202310.1007/978-3-031-44198-1_29(345-356)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-44198-1_29

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents