Abstract
A fast adaptive heuristic stochastic gradient descent method is proposed. It is shown that this algorithm has a higher convergence rate in practical problems than currently popular optimization methods. Furthermore, a justification of this method is given, and difficulties that prevent obtaining optimal estimates for the proposed algorithm are described.
Similar content being viewed by others
REFERENCES
Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow, 2010) [in Russian].
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, Cambridge, Mass., 2016).
A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems (2012), pp. 1097–1105.
A. V. Gasnikov, P. E. Dvurechensky, and I. N. Usmanova, “On the nontriviality of fast randomized methods,” Trudy Mosk. Fiz.-Teknh. Inst. 8 (2), 67–100 (2016).
A. V. Gasnikov, Modern Numerical Optimization Methods: The Universal Gradient Descent Method (Mosc. Fiziko-tekhnicheskii Institut, 2018). arXiv:1711.00394
F. Bach and K. Y. Levy, “A universal algorithm for variational inequalities adaptive to smoothness and noise,” COLT, 2019.
S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, and S. Lacoste-Julien, “Painless stochastic gradient: Interpolation, line-search, and convergence rates,” NIPS, 2019.
J. Nocedal and S. Wright, Numerical Optimization (Springer Science, 2006).
R. Ward, X. Wu, and L. Bottou, “AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization,” ICML, 2019.
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learning Res. 12, 2121–2159 (2011).
Q. Deng, Y. Cheng, and G. Lan, “Optimal adaptive and accelerated stochastic gradient descent,” 2018. arXiv:1810.00553.
K. Y. Levy, A. Yurtsever, and V. Cevher, “Online adaptive methods, universality and acceleration,” NIPS, 2018.
A. N. Iusem, A. Jofre, R. I. Oliveira, and P. Thompson, “Variance-based extragradient methods with line search for stochastic variational inequalities,” SIAM J. Optim. 29, 175–206 (2019).
S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory Of Independence (Oxford Univ. Press, 2013).
D. Panchenko, “Symmetrization approach to concentration inequalities for empirical processes,” Annals Probab. 31, 2068–2081 (2003).
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.
M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1–4.
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: The strongly convex case,” CORE Discussion Paper 2013/16. 2013. https://www.uclouvain.be/cps/ucl/doc/core/documents/coredp2013_16web.pdf
O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Math. Program. 146, 37–75 (2014).
O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for largescale convex optimization,” PhD thesis, CORE UCL, 2013.
M. Li, T. Zhang, Y. Chen, and A. J. Smola, “Efficient mini-batch training for stochastic optimization,” ACM, 2014.
A. Juditsky and A. Nemirovski, “Large deviations of vector-valued martingales in 2-smooth normed spaces,” 2008. arXiv:0809.0813
A. V. Gasnikov and A. I. Tyurin, “Fast gradient descent for convex minimization problems with an oracle producing a (δ; L)-model of function at the requested point,” Comput. Math. Math. Phys. 59, 1085–1097 (2019).
C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Science, 2013).
G. Lan, A. Nemirovski, and A. Shapiro, “Validation analysis of mirror descent stochastic approximation method,” Math. Program. 134, 425–458 (2012).
A. Griewank, “On automatic differentiation,” Math. Program: Recent Developments Appl. 6 (6), 83–107 (1989).
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324 (1998).
A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, PhD thesis, University of Toronto, 2009.
Funding
The work by A. I. Tyurin was supported by the Russian Foundation for Basic Research, project no. 19-31-90062 Aspiranty.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by A. Klimontovich
Rights and permissions
About this article
Cite this article
Ogal’tsov, A.V., Tyurin, A.I. A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems. Comput. Math. and Math. Phys. 60, 1108–1115 (2020). https://doi.org/10.1134/S0965542520070088
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542520070088