Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Sub-linear convergence of a stochastic proximal iteration method in Hilbert space

Published: 01 September 2022 Publication History

Abstract

We consider a stochastic version of the proximal point algorithm for convex optimization problems posed on a Hilbert space. A typical application of this is supervised learning. While the method is not new, it has not been extensively analyzed in this form. Indeed, most related results are confined to the finite-dimensional setting, where error bounds could depend on the dimension of the space. On the other hand, the few existing results in the infinite-dimensional setting only prove very weak types of convergence, owing to weak assumptions on the problem. In particular, there are no results that show strong convergence with a rate. In this article, we bridge these two worlds by assuming more regularity of the optimization problem, which allows us to prove convergence with an (optimal) sub-linear rate also in an infinite-dimensional setting. In particular, we assume that the objective function is the expected value of a family of convex differentiable functions. While we require that the full objective function is strongly convex, we do not assume that its constituent parts are so. Further, we require that the gradient satisfies a weak local Lipschitz continuity property, where the Lipschitz constant may grow polynomially given certain guarantees on the variance and higher moments near the minimum. We illustrate these results by discretizing a concrete infinite-dimensional classification problem with varying degrees of accuracy.

References

[1]
Agarwal A, Bartlett PL, Ravikumar P, and Wainwright MJ Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization IEEE Trans. Inf. Theory 2012 58 5 3235-3249
[2]
Asi, H., Duchi, J.C.: Modeling simple structures and geometry for better stochastic optimization algorithms. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 89, pp. 2425–2434. PMLR, Naha, Japan (2019). https://proceedings.mlr.press/v89/asi19a.html
[3]
Asi H and Duchi JC Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity SIAM J. Optim. 2019 29 3 2257-2290
[4]
Barbu V Nonlinear Differential Equations of Monotone Types in Banach Spaces 2010 New York Springer 272
[5]
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, p. 619. Springer, Cham (2017).
[6]
Bertsekas DP Incremental proximal methods for large scale convex optimization Math. Program. 2011 129 2, Ser. B 163-195
[7]
Bianchi P Ergodic convergence of a stochastic proximal point algorithm SIAM J. Optim. 2016 26 4 2235-2260
[8]
Bianchi P and Hachem W Dynamical behavior of a stochastic forward–backward algorithm using random monotone operators J. Optim. Theory Appl. 2016 171 1 90-120
[9]
Bottou L, Curtis FE, and Nocedal J Optimization methods for large-scale machine learning SIAM Rev. 2018 60 2 223-311
[10]
Brzeźniak Z, Carelli E, and Prohl A Finite-element-based discretizations of the incompressible Navier–Stokes equations with multiplicative random forcing IMA J. Numer. Anal. 2013 33 3 771-824
[11]
Clark DS Short proof of a discrete Gronwall inequality Discrete Appl. Math. 1987 16 3 279-281
[12]
Davis D and Drusvyatskiy D Stochastic model-based minimization of weakly convex functions SIAM J. Optim. 2019 29 1 207-239
[13]
Eckstein J and Bertsekas DP On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators Math. Program. 1992 55 3, Ser. A 293-318
[14]
Eisenmann, M.: Methods for the temporal approximation of nonlinear, nonautonomous evolution equations. PhD thesis, TU Berlin (2019)
[15]
Eisenmann M, Kovács M, Kruse R, and Larsson S On a randomized backward Euler method for nonlinear evolution equations with time-irregular coefficients Found. Comput. Math. 2019 19 6 1387-1430
[16]
Fagan, F., Iyengar, G.: Unbiased scalable softmax optimization. ArXiv Preprint, arXiv:1803.08577 (2018)
[17]
Güler O On the convergence of the proximal point algorithm for convex minimization SIAM J. Control Optim. 1991 29 2 403-419
[18]
Hager WW Updating the inverse of a matrix SIAM Rev. 1989 31 2 221-239
[19]
Lasiecka I and Triggiani R Control Theory for Partial Differential Equations: Continuous and Approximation Theories. I. Encyclopedia of Mathematics and its Applications 2000 Cambridge Cambridge University Press 644 Abstract parabolic systems
[20]
Necoara I General convergence analysis of stochastic first-order methods for composite optimization J. Optim. Theory Appl. 2021 189 1 66-95
[21]
Papageorgiou NS Convex integral functionals Trans. Am. Math. Soc. 1997 349 4 1421-1436
[22]
Papageorgiou NS and Winkert P Applied Nonlinear Functional Analysis. An Introduction 2018 Berlin De Gruyter 612
[23]
Patrascu, A., Irofti, P.: Stochastic proximal splitting algorithm for composite minimization. ArXiv Preprint, arXiv:1912.02039v2 (2020)
[24]
Patrascu A and Necoara I Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization J. Mach. Learn. Res. 2018 18 198 1-42
[25]
Quarteroni A and Valli A Domain Decomposition Methods for Partial Differential Equations Numerical Mathematics and Scientific Computation 1999 New York Oxford University Press 360
[26]
Raginsky M and Rakhlin A Information-based complexity, feedback and dynamics in convex programming IEEE Trans. Inf. Theory 2011 57 10 7036-7056
[27]
Rockafellar RT On the maximal monotonicity of subdifferential mappings Pac. J. Math. 1970 33 209-216
[28]
Rockafellar RT Monotone operators and the proximal point algorithm SIAM J. Control Optim. 1976 14 5 877-898
[29]
Rockafellar RT and Wets RJ-B On the interchange of subdifferentiation and conditional expectations for convex functionals Stochastics 1982 7 3 173-182
[30]
Rosasco L, Villa S, and Vũ BC Convergence of stochastic proximal gradient algorithm Appl. Math. Optim. 2020 82 3 891-917
[31]
Roubíček T Nonlinear Partial Differential Equations with Applications 2013 2 Basel Springer 476
[32]
Ryu, E., Boyd, S.: Stochastic proximal iteration: a non-asymptotic improvement upon stochastic gradient descent. www.math.ucla.edu/eryu/papers/spi.pdf (2016). Accessed 20 March 2020
[33]
Ryu EK and Yin W Proximal–proximal–gradient method J. Comput. Math. 2019 37 6 778-812
[34]
Salim A, Bianchi P, and Hachem W Snake: a stochastic proximal gradient algorithm for regularized problems over large graphs IEEE Trans. Automat. Control 2019 64 5 1832-1847
[35]
Toulis P and Airoldi EM Scalable estimation strategies based on stochastic approximations: classical results and new insights Stat. Comput. 2015 25 4 781-795
[36]
Toulis P and Airoldi EM Asymptotic and finite-sample properties of estimators based on stochastic gradients Ann. Stat. 2017 45 4 1694-1727
[37]
Toulis, P., Airoldi, E., Rennie, J.: Statistical analysis of stochastic gradient methods for generalized linear models. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 667–675. PMLR, Bejing, China (2014). https://proceedings.mlr.press/v32/toulis14.html
[38]
Toulis, P., Horel, T., Airoldi, E.M.: The proximal Robbins–Monro method. ArXiv Preprint, arXiv:1510.00967v4 (2020)
[39]
Toulis, P., Tran, D., Airoldi, E.: Towards stability and optimality in stochastic gradient descent. In: Gretton, A., Robert, C.C. (eds.) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 51, pp. 1290–1298. PMLR, Cadiz, Spain (2016). http://proceedings.mlr.press/v51/toulis16.html
[40]
Tran, D., Toulis, P., Airoldi, E.M.: Stochastic gradient descent methods for estimation with large data sets. ArXiv Preprint, arXiv:1509.06459 (2015)
[41]
Zeidler, E.: Nonlinear Functional Analysis and Its Applications. II/B, pp. 469–1202. Springer, New York (1990). Nonlinear monotone operators

Index Terms

  1. Sub-linear convergence of a stochastic proximal iteration method in Hilbert space
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Computational Optimization and Applications
        Computational Optimization and Applications  Volume 83, Issue 1
        Sep 2022
        375 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 September 2022
        Accepted: 19 May 2022
        Received: 24 September 2021

        Author Tags

        1. Stochastic proximal point
        2. Convergence analysis
        3. Convergence rate
        4. Infinite-dimensional
        5. Hilbert space

        Author Tags

        1. 46N10
        2. 65K10
        3. 90C15

        Qualifiers

        • Research-article

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media