Abstract
In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent method, the gradient covariance matrix approach, and Newton’s method. Besides interpreting these methods within a single framework, we explain their specificities and show under which conditions some of them coincide.
T. Pierrot and N. Perrin-Gilbert—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This context helps to simplify notations, and give examples, but the results obtained are not specific to this setting.
References
Akimoto, Y., Ollivier, Y.: Objective improvement in information-geometric optimization. In: Proceedings of the 12th Workshop on Foundations of Genetic Algorithms XII, pp. 1–10. ACM (2013)
Amari, S.: Neural learning in structured parameter spaces-natural Riemannian gradient. In: Advances in Neural Information Processing Systems, pp. 127–133 (1997)
Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, pp. 161–168 (2008)
Bottou, L., Curtis, F.E., Nocedal., J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Čencov, N.N.: Statistical Decision Rules and Optimal Inference. Translations of Mathematical Monographs, vol. 53. American Mathematical Society, Providence (1982). ISBN 0-8218-4502-0
George, T., Laurent, C., Bouthillier, X., Ballas, N., Vincent, P.: Fast approximate natural gradient descent in a Kronecker-factored eigenbasis. arXiv preprint arXiv:1806.03884 (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Kullback, S.: Information Theory and Statistics. Dover Publications Inc., Mineola (1997). ISBN 0-486-69684-7
Martens, J.: New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014)
Martens, J., Sutskever, I.: Training deep and recurrent networks with hessian-free optimization. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 479–535. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_27
Ollivier, Y.: Riemannian metrics for neural networks I: feedforward networks. Inf. Infer. 4(2), 108–153 (2015)
Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584 (2013)
Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7), 1180–1190 (2008)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR, abs/1502.05477 (2015)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5279–5288 (2017)
Acknowledgements
This research was partially supported by the French National Research Agency (ANR), Project ANR-18-CE33-0005 HUSKI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pierrot, T., Perrin-Gilbert, N., Sigaud, O. (2021). First-Order and Second-Order Variants of the Gradient Descent in a Unified Framework. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-86340-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)