Abstract
Inspired by the fact that certain randomization schemes incorporated into the stochastic (proximal) gradient methods allow for a large reduction in computational time, we incorporate such a scheme into stochastic alternating direction method of multipliers (ADMM), yielding a faster stochastic alternating direction method (FSADM) for solving a class of large scale convex composite problems. In the numerical experiments, we observe a reduction of this method in computational time compared to previous methods. More importantly, we unify the stochastic ADMM for solving general convex and strongly convex composite problems (i.e., the iterative scheme does not change when the the problem goes from strongly convex to general convex). In addition, we establish the convergence rates of FSADM for these two cases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in neural information processing systems, pp 315–323
Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optim 24(4):2057–2075
Zheng S, Kwok JT (2016) Fast-and-light stochastic admm. In: International joint conference on artificial intelligence, pp 2407–2613
Konečnỳ J, Liu J, Richtárik P, Takáč M (2015) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Sel Top Signal Process 10(2):242–255
Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368
Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program 155(1-2):267–305
Goldstein T, Osher S (2009) The split bregman method for l1-regularized problems. SIAM J Imag Sci 2(2):323–343
Yang J, Zhang Y (2011) Alternating direction algorithms for l1-problem in compressive sensing. SIAM J Sci Comput 33(1-2):250–278
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3 (1):1–122
Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math Modell Numer Anal-Modél Math Anal Numér 9(R2):41–76
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Hestenes MR (1969) Multiplier and gradient methods. J Optim Theory Appl 4(5):303–320
Powell MJD (1969) A method for nonlinear constraints in minimization problems. Optimization:283–298
Douglas J, Rachford HH (1956) On the numerical solution of heat conduction problems in two and three space variables. Trans Am Math Soc 82(2):421–439
Eckstein J, Bertsekas DP (1992) On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Program 55(1-3):293–318
He B, Yuan X (2012) On the O(1/n) convergence rate of the douglas–rachford alternating direction method. SIAM J Numer Anal 50(2):700–709
Monteiro Renato DC, Svaiter BF (2013) Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J Optim 23(1):475–507
He B, Yuan X (2015) On non-ergodic convergence rate of douglas–rachford alternating direction method of multipliers. Numer Math 130(3):567–577
Deng W, Yin W (2016) On the global and linear convergence of the generalized alternating direction method of multipliers. J Sci Comput 66(3):889–916
Li M, Sun D, Toh K-C (2015) A convergent 3-block semi-proximal admm for convex minimization problems with one strongly convex block. Asia-Pac J Oper Res 32(04):1550024
Cai X, Han D, Yuan X (2017) On the convergence of the direct extension of admm for three-block separable convex minimization models with one strongly convex function. Comput Optim Appl 66(1):39–73
Hong M, Luo Z-Q (2017) On the linear convergence of the alternating direction method of multipliers. Math Program 162(1-2):165–199
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1-2):57–79
Kurdyka K (1998) On gradients of functions definable in o-minimal structures. Ann l’inst Four 48(3):769–783
Lojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Les équ dérivées Partielles 117:87–89
Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460
Wang Y, Yin W, Zeng J (2019) Global convergence of admm in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63
Jia Z, Gao X, Cai X, Han D (2021) Local linear convergence of the alternating direction method of multipliers for nonconvex separable optimization problems. J Optim Theory Appl 188(1):1–25
Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157
Zhang J, Luo Z-Q (2020) A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM J Optim 30(3):2272–2302
Han D-R (2022) A survey on some recent developments of alternating direction method of multipliers. J Oper Res Soc China:1–52
Ouyang H, He N, Tran L, Gray A (2013) Stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 80–88
Wang H, Banerjee A (2013) Online alternating direction method (longer version). arXiv:1306.3721
Zhong W, Kwok J (2014) Fast stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 46–54
Zhao SY, Li WJ, Zhou ZH (2015) Scalable stochastic alternating direction method of multipliers. arXiv:1502.03529
Chen C, Chen Y, Ouyang Y, Pasiliao E (2018) Stochastic accelerated alternating direction method of multipliers with importance sampling. J Optim Theory Appl 179(2):676–695
Huang F, Chen S, Huang H (2019) Faster stochastic alternating direction method of multipliers for nonconvex optimization. In: International conference on machine learning, pp 2839–2848
Liu Y, Shang F, Cheng J (2017) Accelerated variance reduced stochastic admm. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Xu Y, Liu M, Yang T, Lin Q (2017) No more fixed penalty parameter in admm: Faster convergence with new adaptive penalization. In: Advances in neural information processing systems, pp 1267–1277
Zhao P, Yang J, Zhang T, Li P (2015) Adaptive stochastic alternating direction method of multipliers. In: International conference on machine learning, pp 69–77
Gao X, Jiang B, Zhang S (2018) On the information-adaptive variants of the admm: an iteration complexity perspective. J Sci Comput 76(1):327–363
Xie Y, Shanbhag UV (2019) Si-admm: A stochastic inexact admm framework for stochastic convex programs. IEEE Trans Automat Control 65(6):2355–2370
Roux NL, Schmidt M, Bach FR (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671
Shalev-Shwartz S, Zhang T (2013) Stochastic dual coordinate ascent methods for regularized loss minimization. J Mach Learn Res 14:567–599
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(1/k2). In: Sov. Math. Doklady, vol 27, pp 372–376
Nesterov Y (2013) Introductory lectures on convex optimization: A basic course, vol 87. Springer Science & Business Media, Berlin
Zhou X, Yuan H, Li CJ, Sun Q (2020) Stochastic modified equations for continuous limit of stochastic admm. arXiv:2003.03532
Banerjee O, El Ghaoui L, d’Aspremont A (2008) Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. J Mach Learn Res 9:485– 516
Acknowledgements
This work was supported by the Project of National Natural Science Foundation of China (11731013,11991022) and the Project of Promoting Scientific Research Ability of Excellent Young Teachers supported by the University of Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proof of Lemma 2
Appendix A: Proof of Lemma 2
Proof
According to the optimality condition of subproblem (5), we have
Arranging the above equality, we get
Define the function
then
where \({\text {pro}}{{\text {x}}_{\theta } }\left (\cdot \right )\) is a proximal operator of the function 𝜃. By the μf-convexity of f, we have for any \(x \in {\text {dom}}\left (f \right )\),
by smoothness of f, we can further lower bound \(f\left ({x_{t-1}} \right )\) by
Therefore
Collecting all inner products on the right hand side and adding a term \({\left ({{g_{t}} - {v_{t}}} \right )^{T}}\left ({x - {x_{t}}} \right )\), we have
Hence using gt = wt + vt and \(\eta \le \frac {1}{{{L_{f}}}}\), we obtain
□
Rights and permissions
About this article
Cite this article
Hu, J., Guo, T. & Zhao, T. A faster stochastic alternating direction method for large scale convex composite problems. Appl Intell 52, 14233–14245 (2022). https://doi.org/10.1007/s10489-022-03319-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03319-4