Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Variable Sample-Size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization

Published: 01 February 2022 Publication History

Abstract

Classical theory for quasi-Newton schemes has focused on smooth, deterministic, unconstrained optimization, whereas recent forays into stochastic convex optimization have largely resided in smooth, unconstrained, and strongly convex regimes. Naturally, there is a compelling need to address nonsmoothness, the lack of strong convexity, and the presence of constraints. Accordingly, this paper presents a quasi-Newton framework that can process merely convex and possibly nonsmooth (but smoothable) stochastic convex problems. We propose a framework that combines iterative smoothing and regularization with a variance-reduced scheme reliant on using an increasing sample size of gradients. We make the following contributions. (i) We develop a regularized and smoothed variable sample-size BFGS update (rsL-BFGS) that generates a sequence of Hessian approximations and can accommodate nonsmooth convex objectives by utilizing iterative regularization and smoothing. (ii) In strongly convex regimes with state-dependent noise, the proposed variable sample-size stochastic quasi-Newton (VS-SQN) scheme admits a nonasymptotic linear rate of convergence, whereas the oracle complexity of computing an ϵ-solution is O(κm+1/ϵ), where κ denotes the condition number and m≥1. In nonsmooth (but smoothable) regimes, using Moreau smoothing retains the linear convergence rate for the resulting smoothed VS-SQN (or sVS-SQN) scheme. Notably, the nonsmooth regime allows for accommodating convex constraints. To contend with the possible unavailability of Lipschitzian and strong convexity parameters, we also provide sublinear rates for diminishing step-length variants that do not rely on the knowledge of such parameters. (iii) In merely convex but smooth settings, the regularized VS-SQN scheme rVS-SQN displays a rate of O(1/k(1−ϵ)) with an oracle complexity of O(1/ϵ3). When the smoothness requirements are weakened, the rate for the regularized and smoothed VS-SQN scheme rsVS-SQN worsens to O(k−1/3). Such statements allow for a state-dependent noise assumption under a quadratic growth property on the objective. To the best of our knowledge, the rate results are among the first available rates for QN methods in nonsmooth regimes. Preliminary numerical evidence suggests that the schemes compare well with accelerated gradient counterparts on selected problems in stochastic optimization and machine learning with significant benefits in ill-conditioned regimes.

References

[1]
Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia).
[2]
Beck A, Teboulle M (2012) Smoothing and first order methods: A unified framework. SIAM J. Optim. 22(2):557–580.
[3]
Berahas AS, Nocedal J, Takác M (2016) A multi-batch L-BFGS method for machine learning. Preprint, submitted May 19, https://arxiv.org/abs/1605.06049.
[4]
Bollapragada R, Byrd R, Nocedal J (2018) Adaptive sampling strategies for stochastic optimization. SIAM J. Optim. 28(4):3312–3343.
[5]
Bollapragada R, Mudigere D, Nocedal J, Shi HJM, Tang PTP (2018) A progressive batching L-BFGS method for machine learning. Preprint, submitted February 15, https://arxiv.org/abs/1802.05374.
[6]
Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math. Programming 134(1):127–155.
[7]
Byrd RH, Hansen SL, Nocedal J, Singer Y (2016) A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2):1008–1031.
[8]
Cartis C, Scheinberg K (2018) Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math. Programming 169(2):337–375.
[9]
Deng G, Ferris MC (2009) Variable-number sample-path optimization. Math. Programming 117(1-2):81–109.
[10]
Facchinei F, Fischer A, Kanzow C (1996) Inexact Newton Methods for Semismooth Equations with Applications to Variational Inequality Problems. Nonlinear Optimization and Applications (Springer, Boston), 125–139.
[11]
Facchinei F, Fischer A, Kanzow C (1997) A semismooth Newton method for variational inequalities: The case of box constraints. Ferris MC, Jong-Shi Pang, eds. Complementarity and Variational Problems: State of the Art, vol. 92 (Society for Industrial & Applied), 76.
[12]
Fletcher R (1987) Practical Methods of Optimization (2nd ed.) (Wiley-Interscience, New York).
[13]
Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.
[14]
Ghadimi S, Lan G (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Programming 156(1-2):59–99.
[15]
Gower R, Goldfarb D, Richtárik P (2016) Stochastic block BFGS: Squeezing more curvature out of data. Internat. Conf. Machine Learn., 1869–1878.
[16]
Haarala M (2004) Large-Scale Nonsmooth Optimization: Variable Metric Bundle Method with Limited Memory. Number 40 (University of Jyväskylä, Jyväskylä, Finland).
[17]
Homem-De-Mello T (2003) Variable-sample methods for stochastic optimization. ACM Trans. Model. Comput. Simul. 13(2):108–133.
[18]
Iusem AN, Jofré A, Oliveira RI, Thompson P (2019) Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1):175–206.
[19]
Jalilzadeh A, Nedić A, Shanbhag UV, Yousefian F (2018) A variable sample-size stochastic quasi-Newton method for smooth and nonsmooth stochastic convex optimization. 2018 IEEE Conf. Decision Control (CDC) (IEEE), 4097–4102.
[20]
Jalilzadeh A, Shanbhag UV (2016) eg-VSSA: An extragradient variable sample-size stochastic approximation scheme: Error analysis and complexity trade-offs. Winter Simulation Conf., WSC 2016, Washington, DC, 690–701.
[21]
Jalilzadeh A, Shanbhag UV, Blanchet JH, Glynn PW (2018) Optimal smoothed variable sample-size accelerated proximal methods for structured nonsmooth stochastic convex programs. Preprint, submitted March 2, https://128.84.21.199/abs/1803.00718v1.
[22]
Jofré A, Thompson P (2019) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Program. 174(1-2):253–292.
[23]
Kleywegt AJ, Shapiro A, Homem-de Mello T (2002) The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2):479–502.
[24]
Lewis AS, Overton ML (2008) Behavior of BFGS with an exact line search on nonsmooth examples.
[25]
Lewis DD, Yang Y, Rose TG, Li F (2004) Rcv1: A new benchmark collection for text categorization research. J. Machine Learn. Res. 5(Apr):361–397.
[26]
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math. Program. 45(1-3):503–528.
[27]
Lucchi A, McWilliams B, Hofmann T (2015) A variance reduced stochastic Newton method. Preprint, submitted March 28, https://arxiv.org/abs/1503.08316.
[28]
Lukšan L, Vlček J (1999) Globally convergent variable metric method for convex nonsmooth unconstrained minimization. J. Optim. Theory Appl. 102(3):593–613.
[29]
Mokhtari A, Ribeiro A (2014) RES: Regularized stochastic BFGS algorithm. IEEE Trans. Signal Process. 62(23):6089–6104.
[30]
Mokhtari A, Ribeiro A (2015) Global convergence of online limited memory BFGS. J. Machine Learn. Res. 16(1):3151–3181.
[31]
Moreau JJ (1965) Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France. 93(2):273–299.
[32]
Moritz P, Nishihara R, Jordan M (2016) A linearly-convergent stochastic L-BFGS algorithm. PMLR, vol. 51, 249–258.
[33]
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.
[34]
Nocedal J, Wright SJ (2006) Numerical optimization. Springer Series in Operations Research and Financial Engineering (Springer, New York).
[35]
Paquette C, Scheinberg K (2020) A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30(1):349–376.
[36]
Pasupathy R (2010) On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Oper. Res. 58(4):889–901.
[37]
Pasupathy R, Glynn P, Ghosh S, Hashemi FS (2018) On sampling rates in simulation-based recursions. SIAM J. Optim. 28(1):45–73.
[38]
Planiden C, Wang X (2016) Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers. SIAM J. Optim. 26(2):1341–1364.
[39]
Polyak BT (1987) Introduction to optimization. Translations Series in Mathematics and Engineering (Optimization Software, Inc., New York).
[40]
Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Stat. 22:400–407.
[41]
Roux NL, Schmidt M, Bach FR (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Systems 2663–2671.
[42]
Shanbhag UV, Blanchet JH (2015) Budget-constrained stochastic approximation. Proc. 2015 Winter Simulation Conf., Huntington Beach, CA, 368–379.
[43]
Shashaani S, Hashemi FS, Pasupathy R (2018) ASTRO-DF: A class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4):3145–3176.
[44]
Wang X, Ma S, Goldfarb D, Liu W (2017) Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2):927–956.
[45]
Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4):2057–2075.
[46]
Xie Y, Shanbhag UV (2016) SI-ADMM: a stochastic inexact ADMM framework for resolving structured stochastic convex programs. Proc. 2016 Winter Simulation Conf., 714–725.
[47]
Yousefian F, Nedić A, Shanbhag UV (2017) On smoothing, regularization, and averaging in stochastic approximation methods for stochastic variational inequality problems. Math. Program. 165(1):391–431.
[48]
Yousefian F, Nedich A, Shanbhag UV (2020) On stochastic and deterministic quasi-Newton methods for nonstrongly convex optimization: Asymptotic convergence and rate analysis. SIAM J. Optim. 30(2):1144–1172.
[49]
Yu J, Vishwanathan S, Günter S, Schraudolph NN (2010) A quasi-Newton approach to nonsmooth convex optimization problems in machine learning. J. Mach. Learn. Res. 11(Mar):1145–1200.
[50]
Yuan G, Wei Z, Wang Z (2013) Gradient trust region algorithm with limited memory BFGS update for nonsmooth convex minimization. Comput. Optim. Appl. 54(1):45–64.
[51]
Zhou C, Gao W, Goldfarb D (2017) Stochastic adaptive quasi-Newton methods for minimizing expected values. Internat. Conf. Machine Learn., 4150–4159.

Index Terms

  1. A Variable Sample-Size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Mathematics of Operations Research
        Mathematics of Operations Research  Volume 47, Issue 1
        February 2022
        847 pages
        ISSN:0364-765X
        DOI:10.1287/moor.2022.47.issue-1
        Issue’s Table of Contents

        Publisher

        INFORMS

        Linthicum, MD, United States

        Publication History

        Published: 01 February 2022
        Accepted: 20 November 2020
        Received: 31 December 2018

        Author Tags

        1. Primary: 90C15
        2. secondary: 90C25, 90C53

        Author Tags

        1. stochastic optimization
        2. convex optimization
        3. quasi-Newton methods

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 24 Sep 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media