Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems

  • Published:
Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Abstract

A fast adaptive heuristic stochastic gradient descent method is proposed. It is shown that this algorithm has a higher convergence rate in practical problems than currently popular optimization methods. Furthermore, a justification of this method is given, and difficulties that prevent obtaining optimal estimates for the proposed algorithm are described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.

Similar content being viewed by others

REFERENCES

  1. Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow, 2010) [in Russian].

    Google Scholar 

  2. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, Cambridge, Mass., 2016).

    MATH  Google Scholar 

  3. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems (2012), pp. 1097–1105.

  4. A. V. Gasnikov, P. E. Dvurechensky, and I. N. Usmanova, “On the nontriviality of fast randomized methods,” Trudy Mosk. Fiz.-Teknh. Inst. 8 (2), 67–100 (2016).

    Google Scholar 

  5. A. V. Gasnikov, Modern Numerical Optimization Methods: The Universal Gradient Descent Method (Mosc. Fiziko-tekhnicheskii Institut, 2018). arXiv:1711.00394

  6. F. Bach and K. Y. Levy, “A universal algorithm for variational inequalities adaptive to smoothness and noise,” COLT, 2019.

    Google Scholar 

  7. S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, and S. Lacoste-Julien, “Painless stochastic gradient: Interpolation, line-search, and convergence rates,” NIPS, 2019.

    Google Scholar 

  8. J. Nocedal and S. Wright, Numerical Optimization (Springer Science, 2006).

    MATH  Google Scholar 

  9. R. Ward, X. Wu, and L. Bottou, “AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization,” ICML, 2019.

    Google Scholar 

  10. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learning Res. 12, 2121–2159 (2011).

    MathSciNet  MATH  Google Scholar 

  11. Q. Deng, Y. Cheng, and G. Lan, “Optimal adaptive and accelerated stochastic gradient descent,” 2018. arXiv:1810.00553.

  12. K. Y. Levy, A. Yurtsever, and V. Cevher, “Online adaptive methods, universality and acceleration,” NIPS, 2018.

    Google Scholar 

  13. A. N. Iusem, A. Jofre, R. I. Oliveira, and P. Thompson, “Variance-based extragradient methods with line search for stochastic variational inequalities,” SIAM J. Optim. 29, 175–206 (2019).

    Article  MathSciNet  Google Scholar 

  14. S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory Of Independence (Oxford Univ. Press, 2013).

    Book  Google Scholar 

  15. D. Panchenko, “Symmetrization approach to concentration inequalities for empirical processes,” Annals Probab. 31, 2068–2081 (2003).

    Article  MathSciNet  Google Scholar 

  16. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” ICLR, 2015.

    Google Scholar 

  17. M. D. Gupta and T. Huang, “Bregman distance to l1 regularized logistic regression,” in 19th International Conference on Pattern Recognition, 2008, pp. 1–4.

  18. O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods with inexact oracle: The strongly convex case,” CORE Discussion Paper 2013/16. 2013. https://www.uclouvain.be/cps/ucl/doc/core/documents/coredp2013_16web.pdf

  19. O. Devolder, F. Glineur, and Yu. Nesterov, “First-order methods of smooth convex optimization with inexact oracle,” Math. Program. 146, 37–75 (2014).

    Article  MathSciNet  Google Scholar 

  20. O. Devolder, “Exactness, inexactness and stochasticity in first-order methods for largescale convex optimization,” PhD thesis, CORE UCL, 2013.

  21. M. Li, T. Zhang, Y. Chen, and A. J. Smola, “Efficient mini-batch training for stochastic optimization,” ACM, 2014.

    Book  Google Scholar 

  22. A. Juditsky and A. Nemirovski, “Large deviations of vector-valued martingales in 2-smooth normed spaces,” 2008. arXiv:0809.0813

  23. A. V. Gasnikov and A. I. Tyurin, “Fast gradient descent for convex minimization problems with an oracle producing a (δ; L)-model of function at the requested point,” Comput. Math. Math. Phys. 59, 1085–1097 (2019).

    Article  MathSciNet  Google Scholar 

  24. C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Science, 2013).

    MATH  Google Scholar 

  25. G. Lan, A. Nemirovski, and A. Shapiro, “Validation analysis of mirror descent stochastic approximation method,” Math. Program. 134, 425–458 (2012).

    Article  MathSciNet  Google Scholar 

  26. A. Griewank, “On automatic differentiation,” Math. Program: Recent Developments Appl. 6 (6), 83–107 (1989).

    Google Scholar 

  27. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  28. A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, PhD thesis, University of Toronto, 2009.

Download references

Funding

The work by A. I. Tyurin was supported by the Russian Foundation for Basic Research, project no. 19-31-90062 Aspiranty.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. I. Tyurin.

Additional information

Translated by A. Klimontovich

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ogal’tsov, A.V., Tyurin, A.I. A Heuristic Adaptive Fast Gradient Method in Stochastic Optimization Problems. Comput. Math. and Math. Phys. 60, 1108–1115 (2020). https://doi.org/10.1134/S0965542520070088

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0965542520070088

Keywords:

Navigation