A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks

Meisam Razaviyayn¹,
Maziar Sanjabi² &
Zhi-Quan Luo²

1817 Accesses
Explore all metrics

Abstract

Consider the problem of minimizing the expected value of a cost function parameterized by a random variable. The classical sample average approximation method for solving this problem requires minimization of an ensemble average of the objective at each step, which can be expensive. In this paper, we propose a stochastic successive upper-bound minimization method (SSUM) which minimizes an approximate ensemble average at each iteration. To ensure convergence and to facilitate computation, we require the approximate ensemble average to be a locally tight upper-bound of the expected cost function and be easily optimized. The main contributions of this work include the development and analysis of the SSUM method as well as its applications in linear transceiver design for wireless communication networks and online dictionary learning. Moreover, using the SSUM framework, we extend the classical stochastic (sub-)gradient method to the case of minimizing a nonsmooth nonconvex objective function and establish its convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two sufficient descent three-term conjugate gradient methods for unconstrained optimization problems with applications in compressive sensing

Article 13 July 2021

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

Convergence of projected subgradient method with sparse or low-rank constraints

Article 02 July 2024

Notes

The family of functions $\mathcal {F}$ is uniformly equicontinuous over the set ${\mathcal {X}}$ if for every $\epsilon >0$, there exists a $\delta >0$ such that $|f(x) - f(y)| <\epsilon , \;\forall f \in \mathcal {F}$ and for all $x ,y \in {\mathcal {X}}$ with $|x - y| <\delta $.

References

Plambeck, E.L., Fu, B.R., Robinson, S.M., Suri, R.: Sample-path optimization of convex stochastic performance functions. Math. Program. 75(2), 137–176 (1996)
Article MathSciNet MATH Google Scholar
Robinson, S.M.: Analysis of sample-path optimization. Math. Oper. Res. 21(3), 513–528 (1996)
Article MathSciNet MATH Google Scholar
Healy, K., Schruben, L.W.: Retrospective simulation response optimization. In: Proceedings of the 23rd Conference on Winter Simulation, pp. 901–906. IEEE Computer Society (1991)
Rubinstein, R.Y., Shapiro, A.: Optimization of static simulation models by the score function method. Math. Comput. Simul. 32(4), 373–392 (1990)
Article MathSciNet Google Scholar
Rubinstein, R.Y., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method, vol. 346. Wiley, New York (1993)
MATH Google Scholar
Shapiro, A.: Monte carlo sampling methods. Handb. Oper. Res. Manag. Sci. 10, 353–426 (2003)
Article MathSciNet Google Scholar
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory, vol. 9. Society for Industrial and Applied Mathematics, Philadelphia (2009)
Book MATH Google Scholar
Kim, S., Pasupathy, R., Henderson, S.: A guide to sample average approximation In: Fu, M.C. (ed.) Handbook of Simulation Optimization, pp. 207–243. Springer, New York (2015)
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for non-smooth optimization. arXiv preprint, arXiv:1209.2385 (2012)
Yuille, A.L., Rangarajan, A.: The concave–convex procedure. Neural Comput. 15, 915–936 (2003)
Article MATH Google Scholar
Borman, S.: The expectation maximization algorithm—a short tutorial. Unpublished paper http://ftp.csd.uwo.ca/faculty/olga/Courses/Fall2006/Papers/EM_algorithm.pdf
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Fristedt, B.E., Gray, L.F.: A Modern Approach to Probability Theory. Birkhuser, Boston (1996)
MATH Google Scholar
Dunford, N., Schwartz, J.T.: Linear Operators. Part 1: General Theory. Interscience Publications, New York (1958)
MATH Google Scholar
Bastin, F., Cirillo, C., Toint, P.L.: Convergence theory for nonconvex stochastic programming with an application to mixed logit. Math. Program. 108(2–3), 207–234 (2006)
Article MathSciNet MATH Google Scholar
Larsson, E., Jorswieck, E.: Competition versus cooperation on the MISO interference channel. IEEE J. Sel. Areas Commun. 26, 1059–1069 (2008)
Article Google Scholar
Razaviyayn, M., Luo, Z.Q., Tseng, P., Pang, J.S.: A stackelberg game approach to distributed spectrum management. Math. Program. 129, 197–224 (2011)
Article MathSciNet MATH Google Scholar
Bengtsson, M., Ottersten, B.: Handbook of antennas in wireless communications. In: Godara, L.C. (ed.) Optimal and Suboptimal Transmit Beamforming, CRC, Boco Raton (2001)
Shi, C., Berry, R.A., Honig, M.L.: Local interference pricing for distributed beamforming in MIMO networks. In: Military Communications Conference, MILCOM, pp. 1–6 (2009)
Kim, S.J., Giannakis, G.B.: Optimal resource allocation for MIMO ad hoc cognitive radio networks. IEEE Trans. Inf. Theory 57, 3117–3131 (2011)
Article MathSciNet Google Scholar
Shi, Q., Razaviyayn, M., Luo, Z.Q., He, C.: An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel. IEEE Trans. Signal Process. 59, 4331–4340 (2011)
Article MathSciNet Google Scholar
Razaviyayn, M., Sanjabi, M., Luo, Z.Q.: Linear transceiver design for interference alignment: complexity and computation. IEEE Trans. Inf. Theory 58, 2896–2910 (2012)
Article MathSciNet Google Scholar
Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.S.: Decomposition by partial linearization: parallel optimization of multi-agent systems. arXiv preprint, arXiv:1302.0756 (2013)
Scutari, G., Palomar, D.P., Facchinei, F., Pang, J.S.: Distributed dynamic pricing for mimo interfering multiuser systems: a unified approach. In: 2011 5th International Conference on Network Games, Control and Optimization (NetGCooP), pp. 1–5 (2011)
Hong, M., Luo, Z.Q.: Signal processing and optimal resource allocation for the interference channel. arXiv preprint, arXiv:1206.5144 (2012)
Wajid, I., Eldar, Y.C., Gershman, A.: Robust downlink beamforming using covariance channel state information. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 2285–2288 (2009)
Vucic, N., Boche, H.: Downlink precoding for multiuser MISO systems with imperfect channel knowledge. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 3121–3124 (2008)
Song, E., Shi, Q., Sanjabi, M., Sun, R., Luo, Z.Q.: Robust SINR-constrained MISO downlink beamforming: When is semidefinite programming relaxation tight? In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 3096–3099 (2011)
Tajer, A., Prasad, N., Wang, X.: Robust linear precoder design for multi-cell downlink transmission. IEEE Trans. Signal Process. 59, 235–251 (2011)
Article MathSciNet Google Scholar
Shenouda, M., Davidson, T.N.: On the design of linear transceivers for multiuser systems with channel uncertainty. IEEE J. Sel. Areas Commun. 26, 1015–1024 (2008)
Article Google Scholar
Li, W.C., Chang, T.H., Lin, C., Chi, C.Y.: Coordinated beamforming for multiuser miso interference channel under rate outage constraints. IEEE Trans. Signal Process. 61(5), 1087–1103 (2013)
Negro, F., Ghauri, I., Slock, D.: Sum rate maximization in the noisy MIMO interfering broadcast channel with partial CSIT via the expected weighted MSE. In: International Symposium on Wireless Communication Systems, ISWCS, pp. 576–580 (2012)
Razaviyayn, M., Baligh, H., Callard, A., Luo, Z.Q.: Joint transceiver design and user grouping in a MIMO interfering broadcast channel. In: 45th Annual Conference on Information Sciences and Systems (CISS) pp. 1–6 (2011)
Shi, Q., Razaviyayn, M., Luo, Z.Q., He, C.: An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel. IEEE Trans. Signal Process. 59, 4331–4340 (2011)
Article MathSciNet Google Scholar
Guo, D., Shamai, S., Verdú, S.: Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inf. Theory 51(4), 1261–1282 (2005)
Article MathSciNet MATH Google Scholar
Sampath, H., Stoica, P., Paulraj, A.: Generalized linear precoder and decoder design for MIMO channels using the weighted MMSE criterion. IEEE Trans. Commun. 49(12), 2198–2206 (2001)
Article Google Scholar
Hong, M., Sun, R., Baligh, H., Luo, Z.Q.: Joint base station clustering and beamformer design for partial coordinated transmission in heterogeneous networks. IEEE J. Sel. Areas Commun. 31(2), 226–240 (2013)
Article Google Scholar
3GPP TR 36.814. In: http://www.3gpp.org/ftp/specs/archive/36_series/36.814/
Yousefian, F., Nedić, A., Shanbhag, U.V.: On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica 48(1), 56–67 (2012)
Article MathSciNet MATH Google Scholar
Razaviyayn, M., Sanjabi, M., Luo, Z.Q.: A stochastic weighted MMSE approach to sum rate maximization for a MIMO interference channel. In: IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 325–329 (2013)
Luo, Z.Q., Zhang, S.: Dynamic spectrum management: complexity and duality. IEEE J. Sel. Top. Signal Process. 2(1), 57–73 (2008)
Article Google Scholar
Weeraddana, P.C., Codreanu, M., Latva-aho, M., Ephremides, A., Fischione, C.: Weighted Sum-Rate Maximization in Wireless Networks: A Review. Now Publishers, Hanover (2012)
MATH Google Scholar
Christensen, S.S., Agarwal, R., Carvalho, E., Cioffi, J.M.: Weighted sum-rate maximization using weighted MMSE for MIMO-BC beamforming design. IEEE Trans. Wirel. Commun. 7(12), 4792–4799 (2008)
Article Google Scholar
Schmidt, D.A., Shi, C., Berry, R.A., Honig, M.L., Utschick, W.: Minimum mean squared error interference alignment. In: Forty-Third Asilomar Conference on Signals, Systems and Computers, pp. 1106–1110 (2009)
Negro, F., Shenoy, S.P., Ghauri, I., Slock, D.T.: On the MIMO interference channel. In: Information Theory and Applications Workshop (ITA), 2010, pp. 1–9 (2010)
Shin, J., Moon, J.: Weighted sum rate maximizing transceiver design in MIMO interference channel. In: Global Telecommunications Conference (GLOBECOM), pp. 1–5 (2011)
Razaviyayn, M., Baligh, H., Callard, A., Luo, Z.Q.: Joint transceiver design and user grouping in a MIMO interfering broadcast channel. In: 45th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2011)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: design of dictionaries for sparse representation. Proc. SPARS 5, 9–12 (2005)
Google Scholar
Lewicki, M.S., Sejnowski, T.J.: Learning overcomplete representations. Neural Comput. 12(2), 337–365 (2000)
Article Google Scholar
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)
MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena-Scientific, Belmont (1999)
MATH Google Scholar
Razaviyayn, M., Tseng, H.W., Luo, Z.Q.: Dictionary learning for sparse representation: Complexity and algorithms. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5247–5251 (2014)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet MATH Google Scholar
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462–466 (1952)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Koshal, J., Nedić, A., Shanbhag, U.V.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Autom. Control 58(3), 594–609 (2013)
Article MathSciNet Google Scholar
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)
Article MathSciNet MATH Google Scholar
Ermoliev, Y.: stochastic quasigradient methods and their application to system optimization. Stoch. Int. J. Probab. Stoch. Process. 9(1–2), 1–36 (1983)
MathSciNet MATH Google Scholar
Amari, S.: A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput. 16(3), 299–307 (1967)
Article MathSciNet MATH Google Scholar
Wijnhoven, R., With, P.H.N.D.: Fast training of object detection using stochastic gradient descent. In: Proceedings of IEEE International Conference on Pattern Recognition (ICPR), pp. 424–427 (2010)
Grippo, L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11(6), 1284–1299 (2000)
Article Google Scholar
Mangasarian, O.L., Solodov, M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4(2), 103–116 (1994)
Article Google Scholar
Luo, Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3(2), 226–245 (1991)
Article Google Scholar
Luo, Z.Q., Tseng, P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4(2), 85–101 (1994)
Article MathSciNet Google Scholar
Bottou, L.: Online learning and stochastic approximations. In: On-Line Learning in Neural Networks, vol. 17(9) (1998)
Bertsekas, D.P.: A new class of incremental gradient methods for least squares problems. SIAM J. Optim. 7(4), 913–926 (1997)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, 2nd edn. Athena-Scientific, Belmont (1999)
MATH Google Scholar
Tsitsiklis, J., Bertsekas, D.P., Athans, M.: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Distributed asynchronous computation of fixed points. Math. Program. 27(1), 107–120 (1983)
Article MathSciNet MATH Google Scholar
Ermol’ev, Y.M., Norkin, V.I.: Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. Cybern. Syst. Anal. 34(2), 196–215 (1998)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Optimization for Machine Learning 2010, pp. 1–38 (2011)
Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998)
Article MathSciNet MATH Google Scholar
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65(1), 167–198 (2006)
Article Google Scholar
Broadie, M., Cicek, D., Zeevi, A.: General bounds and finite-time improvement for the Kiefer–Wolfowitz stochastic approximation algorithm. Oper. Res. 59(5), 1211–1224 (2011)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet MATH Google Scholar
Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for $\ell _1$-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
MathSciNet MATH Google Scholar
Fisk, D.L.: Quasi-martingales. Trans. Am. Math. Soc. 120, 369–389 (1965)
Article MathSciNet MATH Google Scholar
Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)
MATH Google Scholar
Hewitt, E., Savage, L.J.: Symmetric measures on cartesian products. Trans. Am. Math. Soc. 80, 470–501 (1955)
Article MathSciNet MATH Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
Book MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to Maury Bramson of the University of Minnesota for valuable discussions.

Author information

Authors and Affiliations

Electrical Engineering Department, Stanford University, Stanford, CA, USA
Meisam Razaviyayn
Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA
Maziar Sanjabi & Zhi-Quan Luo

Authors

Meisam Razaviyayn
View author publications
You can also search for this author in PubMed Google Scholar
Maziar Sanjabi
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Quan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meisam Razaviyayn.

Appendix

Proof of Lemma 1

First of all, it can be observed that ${\hat{f}}_1^r(x^r) - f_1^r(x^r) = {\hat{f}}^r(x^r) - f^r(x^r)$ according to the definition of the functions ${\hat{f}}(\cdot )$ and $f(\cdot )$. Therefore it suffices to show that $\lim _{r\rightarrow \infty } {\hat{f}}^r(x^r) - f^r(x^r)=0,\;\mathrm{almost \;surely}$. The proof of this fact requires the use of quasi martingale convergence theorem [80], much like the convergence proof of online learning algorithms [50, Proposition 3]. In particular, we will show that the sequence $\{{\hat{f}}^r(x^r)\}_{r=1}^{\infty }$ converges almost surely. Notice that

$$\begin{aligned}&{\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r) \nonumber \\&\quad = {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^{r+1}(x^r) + {\hat{f}}^{r+1}(x^r)- {\hat{f}}^r(x^r) \nonumber \\&\quad = {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^{r+1}(x^r) + \frac{1}{r+1} \sum _{i=1}^{r+1} {\hat{g}}(x^r,x^{i-1},\xi ^i) - \frac{1}{r} \sum _{i=1}^r {\hat{g}}(x^r,x^{i-1},\xi ^i) \nonumber \\&\quad = {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^{r+1}(x^r) - \frac{1}{r(r+1)} \sum _{i=1}^r {\hat{g}}(x^r,x^{i-1},\xi ^i) + \frac{1}{r+1} {\hat{g}}(x^r,x^r,\xi ^{r+1}) \nonumber \\&\quad = {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^{r+1}(x^r) - \frac{{\hat{f}}^r(x^r)}{r+1} + \frac{1}{r+1} g(x^r,\xi ^{r+1})\nonumber \\&\quad \le \frac{-{\hat{f}}^r(x^r) + g(x^r,\xi ^{r+1})}{r+1}, \end{aligned}$$

where the last equality is due to the assumption A1 and the inequality is due to the update rule of the SSUM algorithm. Taking the expectation with respect to the natural history yields

$$\begin{aligned} \mathbb {E}\left[ {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r)\bigg |\mathcal {F}^r\right]&\le \mathbb {E}\left[ \frac{-{\hat{f}}^r(x^r) + g(x^r,\xi ^{r+1})}{r+1}\bigg |\mathcal {F}^r\right] \nonumber \\&= \frac{-{\hat{f}}^r(x^r)}{r+1} + \frac{f(x^r)}{r+1} \nonumber \\&= \frac{-{\hat{f}}^r(x^r) + f^r(x^r)}{r+1} + \frac{f(x^r) - f^r(x^r)}{r+1} \end{aligned}$$

(43)

$$\begin{aligned}&\le \frac{f(x^r) - f^r(x^r)}{r+1} \end{aligned}$$

(44)

$$\begin{aligned}&\le \frac{\Vert f-f^r\Vert _\infty }{r+1}, \end{aligned}$$

(45)

where (44) is due to the assumption A2 and (45) follows from the definition of $\Vert \cdot \Vert _\infty $. On the other hand, the Donsker theorem (see [50, Lemma 7] and [81, Chapter 19]) implies that there exists a constant k such that

$$\begin{aligned} \mathbb {E}\left[ \Vert f-f^r\Vert _\infty \right] \le \frac{k}{\sqrt{r}}. \end{aligned}$$

(46)

Combining (45) and (46) yields

$$\begin{aligned} \mathbb {E} \left[ \left( \mathbb {E}\left[ {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r)\bigg |\mathcal {F}^r\right] \right) _+\right] \le \frac{k}{r^{3/2}}, \end{aligned}$$

(47)

where $(a)_+ \triangleq \max \{0,a\}$ is the projection to the non-negative orthant. Summing (47) over r, we obtain

$$\begin{aligned} \sum _{r=1}^\infty \mathbb {E} \left[ \left( \mathbb {E}\left[ {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r)\bigg |\mathcal {F}^r\right] \right) _+\right] \le M < \infty , \end{aligned}$$

(48)

where $M \triangleq \sum _{r=1}^\infty \frac{k}{r^{3/2}}$. The equation (48) combined with the quasi-martingale convergence theorem (see [80] and [50, Theorem 6]) implies that the stochastic process$\{{\hat{f}}^r(x^r) + \bar{g}\}_{r=1}^{\infty }$ is a quasi-martingale with respect to the natural history $\{\mathcal {F}^r\}_{r=1}^{\infty }$ and ${\hat{f}}^r(x^r)$ converges. Moreover, we have

$$\begin{aligned} \sum _{r=1}^\infty \bigg |\mathbb {E}\left[ {\hat{f}}^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r)\big |\mathcal {F}^r\right] \bigg | < \infty ,\quad \mathrm{almost\;surely.} \end{aligned}$$

(49)

Next we use (49) to show that $\sum _{r=1}^\infty \frac{{\hat{f}}^r(x^r) - f^r(x^r)}{r+1} <\infty ,\; \mathrm{almost\;surely}$. To this end, let us rewrite (43) as

$$\begin{aligned} \frac{{\hat{f}}^r(x^r) - f^r(x^r)}{r+1}\le \mathbb {E}\left[ -{\hat{f}}^{r+1}(x^{r+1}) + {\hat{f}}^r(x^r)\bigg |\mathcal {F}^r\right] + \frac{f(x^r) - f^r(x^r)}{r+1}. \end{aligned}$$

(50)

Using the fact that ${\hat{f}}^r(x^r) \ge f^r(x^r),\;\forall \ r$ and summing (50) over all values of r, we have

$$\begin{aligned} 0\le & {} \sum _{r=1}^\infty \frac{{\hat{f}}^r(x^r) - f^r(x^r)}{r+1}\nonumber \\\le & {} \sum _{r=1}^\infty \bigg |\mathbb {E}\left[ -{\hat{f}}^{r+1}(x^{r+1}) + {\hat{f}}^r(x^r)\big |\mathcal {F}^r\right] \bigg | + \sum _{r=1}^\infty \frac{\Vert f-f^r\Vert _\infty }{r+1}. \end{aligned}$$

(51)

Notice that the first term in the right hand side is finite due to (49). Hence in order to show $\sum _{r=1}^\infty \frac{{\hat{f}}^r(x^r) - f^r(x^r)}{r+1}<\infty ,\;\mathrm{almost\;surely}$, it suffices to show that $\sum _{r=1}^\infty \frac{\Vert f-f^r\Vert _\infty }{r+1} < \infty ,\;\mathrm{almost\;surely}$. To show this, we use the Hewitt-Savage zero-one law; see [82, Theorem 11.3] and [13, Chapter 12, Theorem 19]. Let us define the event

$$\begin{aligned} \mathcal {A} \triangleq \left\{ (\xi ^1,\xi ^2,\ldots ) \mid \sum _{r=1}^\infty \frac{\Vert f^r-f\Vert _\infty }{r+1}<\infty \right\} . \end{aligned}$$

It can be checked that the event $\mathcal {A}$ is permutable, i.e., any finite permutation of each element of $\mathcal {A}$ is inside $\mathcal {A}$; see [82, Theorem 11.3] and [13, Chapter 12, Theorem 19]. Therefore, due to the Hewitt-Savage zero-one law [82], probability of the event $\mathcal {A}$ is either zero or one. On the other hand, it follows from (46) that there exists $M'>0$ such that

$$\begin{aligned} \mathbb {E}\left[ \sum _{r=1}^\infty \frac{\Vert f^r-f\Vert _\infty }{r+1}\right] \le M' < \infty . \end{aligned}$$

(52)

Using Markov’s inequality, (52) implies that

$$\begin{aligned} Pr\left( \sum _{r=1}^\infty \frac{\Vert f^r-f\Vert _\infty }{r+1} > 2M'\right) \le \frac{1}{2}. \end{aligned}$$

Hence combining this result with the result of the Hewitt-Savage zero-one law, we obtain $Pr(\mathcal {A}) =1$; or equivalently

$$\begin{aligned} \sum _{r=1}^\infty \frac{\Vert f^r-f\Vert _\infty }{r+1} < \infty , \quad \mathrm{almost \; surely}. \end{aligned}$$

(53)

As a result of (51) and (53), we have

$$\begin{aligned} 0 \le \sum _{r=1}^\infty \frac{{\hat{f}}^r(x^r) - f^r(x^r)}{r+1} < \infty , \quad \mathrm{almost \; surely}. \end{aligned}$$

(54)

On the other hand, it follows from the triangle inequality that

$$\begin{aligned}&\bigg |{\hat{f}}^{r+1}(x^{r+1}) - f^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r) + f^r(x^r)\bigg |\nonumber \\&\quad \le \bigg | {\hat{f}}^{r+1}(x^{r+1})- {\hat{f}}^r(x^r)\bigg | + \bigg | f^{r+1}(x^{r+1})- f^r(x^r)\bigg | \end{aligned}$$

(55)

and

$$\begin{aligned}&\bigg | {\hat{f}}^{r+1}(x^{r+1})- {\hat{f}}^r(x^r)\bigg | \nonumber \\&\quad \le \bigg | {\hat{f}}^{r+1}(x^{r+1})- {\hat{f}}^{r+1}(x^r)\bigg | + \bigg | {\hat{f}}^{r+1}(x^r)- {\hat{f}}^r(x^r)\bigg | \nonumber \\&\quad \le \kappa \Vert x^{r+1} - x^r\Vert +\left| \frac{1}{r+1} \sum _{i=1}^{r+1} {\hat{g}}(x^r,x^{i-1},\xi ^i)- \frac{1}{r} \sum _{i=1}^{r} {\hat{g}}(x^r,x^{i-1},\xi ^i) \right| \end{aligned}$$

(56)

$$\begin{aligned}&\quad \le \kappa \Vert x^{r+1} - x^r\Vert +\left| \frac{1}{r(r+1)} \sum _{i=1}^{r} {\hat{g}}(x^r,x^{i-1},\xi ^i) + \frac{{\hat{g}}(x^r,x^r,\xi ^{r+1})}{r+1}\right| \nonumber \\&\quad \le \kappa \Vert x^{r+1} - x^r\Vert + \frac{2\bar{g}}{r+1} \end{aligned}$$

(57)

$$\begin{aligned}&\quad = \mathcal {O}\left( \frac{1}{r}\right) , \end{aligned}$$

(58)

where (56) is due to the assumption B3 (with $\kappa =(K+K') $); (57) follows from the assumption B6, and (58) will be shown in Lemma 2. Similarly, one can show that

$$\begin{aligned} |f^{r+1}(x^{r+1}) - f^r(x^r)| = \mathcal {O}\left( \frac{1}{r}\right) . \end{aligned}$$

(59)

It follows from (55), (58), and (59) that

$$\begin{aligned} \bigg |{\hat{f}}^{r+1}(x^{r+1}) - f^{r+1}(x^{r+1}) - {\hat{f}}^r(x^r) + f^r(x^r)\bigg | = \mathcal {O}\left( \frac{1}{r}\right) . \end{aligned}$$

(60)

Let us fix a random realization $\{\xi ^r\}_{r=1}^\infty $ in the set of probability one for which (54) and (60) hold. Define

$$\begin{aligned} \varpi ^r \triangleq {\hat{f}}^r(x^r) - f^r(x^r). \end{aligned}$$

Clearly, $\varpi ^r \ge 0$ and $\sum _r \frac{\varpi ^r}{r}<\infty $ due to (54). Moreover, it follows from (60) that $|\varpi ^{r+1}-\varpi ^r| < \frac{\tau }{r}$ for some constant $\tau >0$. Hence Lemma 3 implies that

$$\begin{aligned} \lim _{r \rightarrow \infty } \; \varpi ^r = 0, \end{aligned}$$

which is the desired result. $\square $

Lemma 2

$\Vert x^{r+1} - x^r\Vert = \mathcal {O}(\frac{1}{r}).$

Proof

The proof of this lemma is similar to the proof of [50, Lemma 1]; see also [83, Proposition 4.32]. First of all, since $x^r$ is the minimizer of ${\hat{f}}^r(\cdot )$, the first order optimality condition implies

$$\begin{aligned} {\hat{f}}^r(x^r;d) \ge 0, \quad \forall \ d\in \mathbb {R}^n. \end{aligned}$$

Hence, it follows from the assumption A3 that

$$\begin{aligned} {\hat{f}}^r(x^{r+1}) - {\hat{f}}^r(x^r) \ge \frac{\gamma }{2} \Vert x^{r+1} - x^r\Vert ^2. \end{aligned}$$

(61)

On the other hand,

$$\begin{aligned} {\hat{f}}^r(x^{r+1}) - {\hat{f}}^r(x^r)&\le {\hat{f}}^r(x^{r+1}) - {\hat{f}}^{r+1}(x^{r+1}) + {\hat{f}}^{r+1}(x^r) - {\hat{f}}^r(x^r) \end{aligned}$$

(62)

$$\begin{aligned}&\le \frac{1}{r(r+1)} \sum _{i=1}^r |{\hat{g}}(x^{r+1},x^{i-1},\xi ^i) - {\hat{g}}(x^r,x^{i-1},\xi ^i)| \nonumber \\&\quad + \frac{1}{r+1}|{\hat{g}}(x^{r+1},x^r,\xi ^{r+1}) - {\hat{g}}(x^r,x^r,\xi ^{r+1})| \nonumber \\&\le \frac{\theta }{r+1} \Vert x^{r+1} - x^r\Vert , \end{aligned}$$

(63)

where (62) follows from the fact that $x^{r+1}$ is the minimizer of ${\hat{f}}^{r+1}(\cdot )$, the second inequality is due to the definitions of ${\hat{f}}^r$ and ${\hat{f}}^{r+1}$, while (63) is the result of the assumptions B3 and B5. Combining (61) and (63) yields the desired result. $\square $

Lemma 3

Assume $\varpi ^r > 0$ and $\sum _{r=1}^\infty \frac{\varpi ^r}{r} <\infty $. Furthermore, suppose that $|\varpi ^{r+1}-\varpi ^r| \le {\tau }/{r}$ for all r. Then $\lim _{r\rightarrow \infty } \varpi ^r = 0$.

Proof

Since $\sum _{r=1}^{\infty } \frac{\varpi ^r}{r} < \infty $, we have $\liminf _{r \rightarrow \infty } \varpi ^r = 0$. Now, we prove the result using contradiction. Assume the contrary so that

$$\begin{aligned} \limsup _{r \rightarrow \infty } \varpi ^r >\epsilon , \end{aligned}$$

(64)

for some $\epsilon >0$. Hence there should exist subsequences $\{m_j\}$ and $\{n_j\}$ with $m_j\le n_j<m_{j+1},\forall \ j$ so that

$$\begin{aligned} \frac{\epsilon }{3} < \varpi ^r \quad \quad&m_j \le r <n_j, \end{aligned}$$

(65)

$$\begin{aligned} \varpi ^r \le \frac{\epsilon }{3} \quad \quad&n_j \le r < m_{j+1}. \end{aligned}$$

(66)

On the other hand, since $\sum _{r=1}^{\infty } \frac{\varpi ^r}{r} < \infty $, there exists an index $\bar{r}$ such that

$$\begin{aligned} \sum _{r = \bar{r}}^{\infty } \frac{\varpi ^r}{r} < \frac{\epsilon ^2}{9\tau }. \end{aligned}$$

(67)

Therefore, for every $r_0 \ge \bar{r}$ with $m_j \le r_0 \le n_j-1$, we have

$$\begin{aligned} |\varpi ^{n_j} - \varpi ^{r_0}|&\le \sum _{r = r_0}^{n_j-1}|\varpi ^{r+1} - \varpi ^r| \nonumber \\&\le \sum _{r = r_0}^{n_j-1} \frac{\tau }{r} \end{aligned}$$

(68)

$$\begin{aligned}&\le \frac{3}{\epsilon } \sum _{r = r_0}^{n_j-1} \frac{\tau }{r} \varpi ^r \end{aligned}$$

(69)

$$\begin{aligned}&\le \frac{3\tau \epsilon ^2}{9\epsilon \tau } = \frac{\epsilon }{3}, \end{aligned}$$

(70)

where the equation (69) follows from (65), and (70) is the direct consequence of (67). Hence the triangle inequality implies

$$\begin{aligned} \alpha ^{r_0} \le \varpi ^{n_j} + |\varpi ^{n_j} - \alpha ^{r_0}| \le \frac{\epsilon }{3} + \frac{\epsilon }{3} = \frac{2\epsilon }{3}, \end{aligned}$$

for any $r_0 \ge \bar{r}$, which contradicts (64), implying that

$$\begin{aligned} \limsup _{r\rightarrow \infty } \varpi ^r = 0. \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Razaviyayn, M., Sanjabi, M. & Luo, ZQ. A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks. Math. Program. 157, 515–545 (2016). https://doi.org/10.1007/s10107-016-1021-7

Download citation

Received: 09 July 2013
Accepted: 27 April 2016
Published: 06 May 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10107-016-1021-7

A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two sufficient descent three-term conjugate gradient methods for unconstrained optimization problems with applications in compressive sensing

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

Convergence of projected subgradient method with sparse or low-rank constraints

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two sufficient descent three-term conjugate gradient methods for unconstrained optimization problems with applications in compressive sensing

Compressive Sensing and Neural Networks from a Statistical Learning Perspective

Convergence of projected subgradient method with sparse or low-rank constraints

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now