research-article

ADINE: an adaptive momentum method for stochastic gradient descent

Authors:

Vishwak Srinivasan,

Adepu Ravi Sankar,

Vineeth N BalasubramanianAuthors Info & Claims

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 249 - 256

https://doi.org/10.1145/3152494.3152515

Published: 11 January 2018 Publication History

Abstract

Momentum based learning algorithms are one of the most successful learning algorithms in both convex and non-convex optimization. Two major momentum based techniques that achieved tremendous success in gradient-based optimization are Polyak's heavy ball method and Nesterov's accelerated gradient. A crucial step in all the momentum based methods is the choice of the momentum parameter m, which is always set to less than 1. Although the choice of m < 1 is justified only under very strong theoretical assumptions, it works well in practice. In this paper we propose a new momentum based method ADINE, which relaxes the constraint of m < 1 and allows the learning algorithm to use adaptive higher momentum. We motivate our relaxation on m by experimentally verifying that a higher momentum (≥ 1) can help escape saddles much faster. ADINE uses this intuition and helps weigh the previous updates more, inherently setting the momentum parameter to be greater in the optimization method. To the best of our knowledge, the idea of increased momentum is first of its kind and is very novel. We evaluate this on deep neural networks and show that ADINE helps the learning algorithm to converge much faster without compromising on the generalization error.

References

[1]

Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems. 2933--2941.

Digital Library

[2]

John Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Technical Report UCB/EECS-2010--24. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-24.html

[3]

Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. 2015. Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition. In Proceedings of The 28th Conference on Learning Theory (Proceedings of Machine Learning Research), Vol. 40. PMLR, 797--842.

[4]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Vol. 9. PMLR, Chia Laguna Resort, Sardinia, Italy.

[5]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).

[6]

Yurii Nesterov. 1983. A method of solving a convex programming problem with convergence rate O (1/k2). In Soviet Mathematics Doklady, Vol. 27. 372--376.

[7]

Boris Teodorovich Polyak. 1964. Some methods of speeding up the convergence of iteration methods. U. S. S. R. Comput. Math. and Math. Phys. 4, 5 (1964), 1--17.

[8]

Sashank J Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, and Alex Smola. 2016. Stochastic variance reduction for nonconvex optimization. In International conference on machine learning. 314--323.

Digital Library

[9]

M. Riedmiller and H. Braun. 1993. A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In IEEE International Conference on Neural Networks. 586--591 vol.1.

[10]

Nicolas L Roux, Mark Schmidt, and Francis R Bach. 2012. A stochastic gradient method with an exponential convergence _rate for finite training sets. In Advances in Neural Information Processing Systems. 2663--2671.

Digital Library

[11]

Shai Shalev-Shwartz and Tong Zhang. 2013. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research 14, Feb (2013), 567--599.

Digital Library

[12]

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the Importance of Initialization and Momentum in Deep Learning. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML'13). JMLR.org, III-1139--III-1147. http://dl.acm.org/citation.cfm?id=3042817.3043064

Digital Library

[13]

T. Tieleman and G. Hinton. 2012. Lecture 6.5---RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. (2012).

[14]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In BMVC.

[15]

Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).

Cited By

Jiang JGou YZhang WYang JGu JShao H(2022)A Neural Network Algorithm of Learning Rate Adaptive Optimization and Its Application in Emitter RecognitionSimulation Tools and Techniques10.1007/978-3-030-97124-3_29(390-402)Online publication date: 31-Mar-2022
https://doi.org/10.1007/978-3-030-97124-3_29
Levy KKavis ACevher VRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)STORM+Proceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541834(20571-20582)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541834
Liu SZhang JXie YDuan JHuang FDuan XZeng Z(2020)Research and analysis on defect detection of semi-conductive layer of high voltage cableE3S Web of Conferences10.1051/e3sconf/202018501057185(01057)Online publication date: 1-Sep-2020
https://doi.org/10.1051/e3sconf/202018501057
Show More Cited By

Recommendations

Smooth momentum: improving lipschitzness in gradient descent
Abstract
Deep neural network optimization is challenging. Large gradients in their chaotic loss landscape lead to unstable behavior during gradient descent. In this paper, we investigate a stable gradient descent algorithm. We revisit the mathematical ...
Last-iterate convergence analysis of stochastic momentum methods for neural networks
Abstract
The stochastic momentum method is a commonly used acceleration technique for solving large-scale stochastic optimization problems. Current convergence results of stochastic momentum methods under non-convex stochastic settings mostly ...
An online gradient method with momentum for two-layer feedforward neural networks

An online gradient method with momentum for two-layer feedforward neural networks is considered. The momentum coefficient is chosen in an adaptive manner to accelerate and stabilize the learning procedure of the network weights. Corresponding ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

January 2018

379 pages

ISBN:9781450363419

DOI:10.1145/3152494

Conference Chair:
Sayan Ranu
IIT Delhi
,
General Chairs:
Niloy Ganguly
IIT Kharagpur
,
Raghu Ramakrishnan
Microsoft
,
Program Chairs:
Sunita Sarawagi
IIT Bombay
,
Shourya Roy
American Express Big Data Labs

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Human Resource Development, Govt of India
Intel India
Microsoft Research India

Conference

CoDS-COMAD '18

CoDS-COMAD '18: The ACM India Joint International Conference on Data Science & Management of Data

January 11 - 13, 2018

Goa, India

Acceptance Rates

CODS-COMAD '18 Paper Acceptance Rate 50 of 150 submissions, 33%;

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
183
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang JGou YZhang WYang JGu JShao H(2022)A Neural Network Algorithm of Learning Rate Adaptive Optimization and Its Application in Emitter RecognitionSimulation Tools and Techniques10.1007/978-3-030-97124-3_29(390-402)Online publication date: 31-Mar-2022
https://doi.org/10.1007/978-3-030-97124-3_29
Levy KKavis ACevher VRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)STORM+Proceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541834(20571-20582)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541834
Liu SZhang JXie YDuan JHuang FDuan XZeng Z(2020)Research and analysis on defect detection of semi-conductive layer of high voltage cableE3S Web of Conferences10.1051/e3sconf/202018501057185(01057)Online publication date: 1-Sep-2020
https://doi.org/10.1051/e3sconf/202018501057
Duan JLiu SZeng ZHuang FDuan X(2020)Electrical performance analysis of 110 kv GIS terminal extension conducting rodE3S Web of Conferences10.1051/e3sconf/202018501020185(01020)Online publication date: 1-Sep-2020
https://doi.org/10.1051/e3sconf/202018501020
Du JLi LGu PXie Q(2019)A Group Recommendation Approach Based on Neural Network Collaborative Filtering2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2019.00-18(148-154)Online publication date: Apr-2019
https://doi.org/10.1109/ICDEW.2019.00-18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten