On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network

Juseung Yun¹²,
Janghyeon Lee¹³,
Hyounguk Shon¹²,
Eojindl Yi¹²,
Seung Hwan Kim¹³ &
…
Junmo Kim¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13672))

Included in the following conference series:

European Conference on Computer Vision

2266 Accesses
1 Citations

Abstract

Modern deep neural networks are equipped with normalization layers such as batch normalization or layer normalization to enhance and stabilize training dynamics. If a network contains such normalization layers, the optimization objective is invariant to the scale of the neural network parameters. The scale-invariance induces the neural network’s output to be only affected by the weights’ direction and not the weights’ scale. We first find a common feature of good hyperparameter combinations on such a scale-invariant network, including learning rate, weight decay, number of data samples, and batch size. Then we observe that hyperparameter setups that lead to good performance show similar degrees of angular update during one epoch. Using a stochastic differential equation, we analyze the angular update and show how each hyperparameter affects it. With this relationship, we can derive a simple hyperparameter tuning method and apply it to the efficient hyperparameter search.

J. Yun—Work done during an internship at LG AI Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Hyperparameter Tuning and Optimization Applications

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Gardiner, C.W., et al.: Handbook of stochastic methods, vol. 3. Springer, Berlin (1985)
Google Scholar
Goyal, P., et al.: Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5927–5935 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hoffer, E., Banner, R., Golan, I., Soudry, D.: Norm matters: efficient and accurate normalization schemes in deep networks. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). http://www.proceedings.neurips.cc/paper/2018/file/a0160709701140704575d499c997b6ca-Paper.pdf
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. arXiv preprint arXiv:1705.08741 (2017)
Hoffer, E., Hubara, I., Soudry, D.: Fix your classifier: the marginal value of training the last weight layer (2018)
Google Scholar
Howard, A., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7), 3 (2015)
Google Scholar
Lewkowycz, A., Gur-Ari, G.: On the training dynamics of deep networks with $ l_2 $ regularization. arXiv preprint arXiv:2006.08643 (2020)
Li, Q., Tai, C., Weinan, E.: Stochastic modified equations and adaptive stochastic gradient algorithms. In: International Conference on Machine Learning, pp. 2101–2110. PMLR (2017)
Google Scholar
Li, Q., Tai, C., Weinan, E.: Stochastic modified equations and dynamics of stochastic gradient algorithms i: mathematical foundations. J. Mach. Learn. Res. 20(1), 1474–1520 (2019)
MathSciNet MATH Google Scholar
Li, Z., Arora, S.: An exponential learning rate schedule for deep learning. In: International Conference on Learning Representations (2020), http://www.openreview.net/forum?id=rJg8TeSFDH
Li, Z., Lyu, K., Arora, S.: Reconciling modern deep learning with traditional optimization analyses: the intrinsic learning rate. In: Advances in Neural Information Processing Systems 33 (2020)
Google Scholar
Li, Z., Malladi, S., Arora, S.: On the validity of modeling sgd with stochastic differential equations (sdes). arXiv preprint arXiv:2102.12470 (2021)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Smith, S., Elsen, E., De, S.: On the generalization benefit of noise in stochastic gradient descent. In: International Conference on Machine Learning, pp. 9058–9067. PMLR (2020)
Google Scholar
Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489 (2017)
Smith, S.L., Le, Q.V.: A bayesian perspective on generalization and stochastic gradient descent. arXiv preprint arXiv:1710.06451 (2017)
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
Van Laarhoven, T.: L2 regularization versus batch and weight normalization. arXiv preprint arXiv:1706.05350 (2017)
Wan, R., Zhu, Z., Zhang, X., Sun, J.: Spherical motion dynamics: learning dynamics of normalized neural network using sgd and weight decay. In: Advances in Neural Information Processing Systems 34 (2021)
Google Scholar
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 681–688. Citeseer (2011)
Google Scholar
Wu, Y., He, K.: Group normalization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_1
Chapter Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yun, J., Kim, B., Kim, J.: Weight decay scheduling and knowledge distillation for active learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 431–447. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_26
Chapter Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference 2016. British Machine Vision Association (2016)
Google Scholar
Zhang, G., Wang, C., Xu, B., Grosse, R.: Three mechanisms of weight decay regularization. In: International Conference on Learning Representations (2019). http://www.openreview.net/forum?id=B1lz-3Rct7

Download references

Acknowledgment

This research was supported by the Engineering Research Center Program through the National Research Foundation of Korea (NRF) funded by the Korean Government MSIT (NRF-2018R1A5A1059921).

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Juseung Yun, Hyounguk Shon, Eojindl Yi & Junmo Kim
LG AI Research, Seoul, South Korea
Janghyeon Lee & Seung Hwan Kim

Authors

Juseung Yun
View author publications
You can also search for this author in PubMed Google Scholar
Janghyeon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyounguk Shon
View author publications
You can also search for this author in PubMed Google Scholar
Eojindl Yi
View author publications
You can also search for this author in PubMed Google Scholar
Seung Hwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Junmo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junmo Kim .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 280 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yun, J., Lee, J., Shon, H., Yi, E., Kim, S.H., Kim, J. (2022). On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13672. Springer, Cham. https://doi.org/10.1007/978-3-031-19775-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-19775-8_8
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19774-1
Online ISBN: 978-3-031-19775-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Hyperparameter Tuning and Optimization Applications

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 280 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Angular Update and Hyperparameter Tuning of a Scale-Invariant Network

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AdaSmooth: An Adaptive Learning Rate Method Based on Effective Ratio

MaxGain: Regularisation of Neural Networks by Constraining Activation Magnitudes

Hyperparameter Tuning and Optimization Applications

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 280 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation