research-article

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Authors:

Y. SingerAuthors Info & Claims

SIAM Journal on Optimization, Volume 26, Issue 2

Pages 1008 - 1031

https://doi.org/10.1137/140954362

Published: 01 January 2016 Publication History

Abstract

The question of how to incorporate curvature information into stochastic approximation methods is challenging. The direct application of classical quasi-Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust, and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at spaced intervals. One way to do this is through (subsampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients at every iteration, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that suggest that the proposed method shows much promise.

References

[1]

S.-I. Amari, Natural gradient works efficiently in learning, Neural Comput., 10 (1998), pp. 251--276.

[2]

S. Asmussen and P. W. Glynn, Stochastic Simulation: Algorithms and Analysis, Stoch. Model. Appl. Probab. 57, Springer, New York, 2007.

[3]

F. Bach and E. Moulines, Non-strongly-convex smooth stochastic approximation with convergence rate $o (1/n)$, in Advances in Neural Information Processing Systems 26, Curran, Red Hook, NY, 2013, pp. 773--781.

[4]

A. Bordes, L. Bottou, and P. Gallinari, SGD-QN: Careful quasi-Newton stochastic gradient descent, J. Mach. Learn. Res., 10 (2009), pp. 1737--1754.

[5]

L. Bottou and O. Bousquet, The tradeoffs of large scale learning, in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. Roweis, eds., MIT Press, Cambridge, MA, 2008, pp. 161--168.

[6]

L. Bottou and Y. LeCun, Large scale online learning, in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, eds., MIT Press, Cambridge, MA, 2004.

[7]

R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Math. Program., 134 (2012), pp. 127--155.

[8]

R. H Byrd, G. M Chin, W. Neveitt, and J. Nocedal, On the use of stochastic Hessian information in optimization methods for machine learning, SIAM J. Optim., 21 (2011), pp. 977--995.

[9]

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12 (2011), pp. 2121--2159.

[10]

R. Fletcher, Practical Methods of Optimization, 2nd ed., Wiley, Chichester, 1987.

[11]

D. D Lewis, Y. Yang, T. G. Rose, and F. Li, Rcv$1$: A new benchmark collection for text categorization research, J. Mach. Learn. Res., 5 (2004), pp. 361--397.

[12]

A. Mokhtari and A. Ribeiro, Regularized stochastic BFGS algorithm, in IEEE Global Conference on Signal and Information Processing, IEEE, Piscataway, NJ, 2013.

[13]

A. Mokhtari and A. Ribeiro, Global convergence of online limited memory bfgs, preprint, arXiv:1409.2045, 2014.

[14]

I. Mukherjee, K. Canini, R. Frongillo, and Y. Singer, Parallel boosting with momentum, in ECML PKDD 2013, Part III, Lecture Notes in Comput. Sci. 8190, Springer, Heidelberg, 2013, pp. 17--32.

[15]

N. Murata, A statistical study of on-line learning, in On-line Learning in Neural Networks, Cambridge University Press, Cambridge, UK, 1998, pp. 63--92.

[16]

A. Nedić and D. Bertsekas, Convergence rate of incremental subgradient algorithms, in Stochastic Optimization: Algorithms and Applications, Springer, Boston, 2001, pp. 223--264.

[17]

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19 (2009), pp. 1574--1609.

[18]

J. Nocedal and S. Wright, Numerical Optimization, 2nd ed., Springer New York, 1999.

[19]

H. Park, S.-I. Amari, and K. Fukumizu, Adaptive natural gradient learning algorithms for various stochastic models, Neural Networks, 13 (2000), pp. 755--764.

[20]

A. Plakhov and P. Cruz, A stochastic approximation algorithm with step-size adaptation, J. Math. Sci., 120 (2004), pp. 964--973.

[21]

M. J. D. Powell, Some global convergence properties of a variable metric algorithm for minimization without exact line searches, in Nonlinear Programming, R. W. Cottle and C. E. Lemke, eds., SIAM-AMS Proc. 9, AMS, Providence, RI, 1976, pp. 53--72.

[22]

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., 22 (1951), pp. 400--407.

[23]

N. L. Roux and A. W. Fitzgibbon, A fast natural Newton method, in Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 623--630.

[24]

N. L. Roux, P.-A. Manzagol, and Y. Bengio, Topmoumoute online natural gradient algorithm, in Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, 2007, pp. 849--856.

[25]

N. Schraudolph, J. Yu, and S. Günter, A stochastic quasi-Newton method for online convex optimization, in Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, Microtome Publishing, Brookline, MA, 2007, pp. 436--443.

[26]

P. Sunehag, J. Trumpf, S. V. N. Vishwanathan, and N. Schraudolph, Variable metric stochastic approximation theory, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, Microtome Publishing, Brookline, MA, (2007), pp. 436--443.

[27]

P. Tseng and S. Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. Program., 117 (2009), pp. 387--423.

[28]

F. Yousefian, A. Nedić, and U. V. Shanbhag, On stochastic gradient and subgradient methods with adaptive steplength sequences, Automatica J. IFAC, 48 (2012), pp. 56--67.

Cited By

Gan MSu XChen GChen JChen C(2025)Online Learning Under a Separable Stochastic Approximation FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.349578347:2(1317-1330)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TPAMI.2024.3495783
He JLiu HWu YZheng ZZhu TRossetti MHunter S(2024)A Preliminary Study on Accelerating Simulation Optimization with GPU ImplementationProceedings of the Winter Simulation Conference10.5555/3712729.3713018(3458-3469)Online publication date: 15-Dec-2024
https://dl.acm.org/doi/10.5555/3712729.3713018
Burroni JDomke JSheldon DKiyavash NMooij J(2024)Sample average approximation for black-box variational inferenceProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702699(471-498)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702699
Show More Cited By

Recommendations

On Stochastic and Deterministic Quasi-Newton Methods for Nonstrongly Convex Optimization: Asymptotic Convergence and Rate Analysis

Motivated by applications arising from large-scale optimization and machine learning, we consider stochastic quasi-Newton (SQN) methods for solving unconstrained convex optimization problems. Much of the convergence analysis of SQN methods, in both full and ...
Primal-Dual Active-Set Methods for Large-Scale Optimization

In this paper, we introduce two primal-dual active-set methods for solving large-scale constrained optimization problems. The first method minimizes a sequence of primal-dual augmented Lagrangian functions subject to bounds on the primal variables and ...
A coordinate gradient descent method for nonsmooth separable minimization

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 26, Issue 2

2016

519 pages

ISSN:1052-6234

DOI:10.1137/sjope8.26.2

Issue’s Table of Contents

© 2016, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2016

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gan MSu XChen GChen JChen C(2025)Online Learning Under a Separable Stochastic Approximation FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.349578347:2(1317-1330)Online publication date: 1-Feb-2025
https://dl.acm.org/doi/10.1109/TPAMI.2024.3495783
He JLiu HWu YZheng ZZhu TRossetti MHunter S(2024)A Preliminary Study on Accelerating Simulation Optimization with GPU ImplementationProceedings of the Winter Simulation Conference10.5555/3712729.3713018(3458-3469)Online publication date: 15-Dec-2024
https://dl.acm.org/doi/10.5555/3712729.3713018
Burroni JDomke JSheldon DKiyavash NMooij J(2024)Sample average approximation for black-box variational inferenceProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702699(471-498)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702699
Liu ZLuo LLow BWooldridge MDy JNatarajan S(2024)Incremental quasi-newton methods with faster superlinear convergence ratesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i13.29319(14097-14105)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i13.29319
Zampini SZerbinati UTurkyyiah GKeyes DEvans KSchenk O(2024)PETScML: Second-Order Solvers for Training Regression Problems in Scientific Machine LearningProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3659914.3659931(1-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3659914.3659931
Wu JHu JZhang HWen Z(2024)Convergence Analysis of an Adaptively Regularized Natural Gradient MethodIEEE Transactions on Signal Processing10.1109/TSP.2024.339849672(2527-2542)Online publication date: 9-May-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3398496
Zhang MLi S(2024)A Proximal Stochastic Quasi-Newton Algorithm with Dynamical Sampling and Stochastic Line SearchJournal of Scientific Computing10.1007/s10915-024-02748-2102:1Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1007/s10915-024-02748-2
Zhao MLai ZLim L(2024)Stochastic Steffensen methodComputational Optimization and Applications10.1007/s10589-024-00583-789:1(1-32)Online publication date: 7-Jun-2024
https://dl.acm.org/doi/10.1007/s10589-024-00583-7
Grieshammer MPflug LStingl MUihlein A(2024)The continuous stochastic gradient method: part I–convergence theoryComputational Optimization and Applications10.1007/s10589-023-00542-887:3(935-976)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10589-023-00542-8
Kunstner FPortella VSchmidt MHarvey NOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Searching for optimal per-coordinate step-sizes with multidimensional backtrackingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666244(2725-2767)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666244
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents