Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Online non-monotone diminishing return submodular maximization in the bandit setting

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper, we study online diminishing return submodular (DR-submodular for short) maximization in the bandit setting. Our focus is on problems where the reward functions can be non-monotone, and the constraint set is a general convex set. We first present the Single-sampling Non-monotone Frank-Wolfe algorithm. This algorithm only requires a single call to each reward function, and it computes the stochastic gradient to make it suitable for large-scale settings where full gradient information might not be available. We provide an analysis of the approximation ratio and regret bound of the proposed algorithm. We then propose the Bandit Online Non-monotone Frank-Wolfe algorithm to adjust for problems in the bandit setting, where each reward function returns the function value at a single point. We take advantage of smoothing approximations to reward functions to tackle the challenges posed by the bandit setting. Under mild assumptions, our proposed algorithm can reach \(\frac{1}{4} (1- \min _{x\in {\mathcal {P}}'}\Vert x\Vert _\infty )\)-approximation with regret bounded by \(O (T^{\frac{5 \min \{1, \gamma \}+5 }{6 \min \{1, \gamma \}+5}})\), where the positive parameter \(\gamma \) is related to the “safety domain” \({\mathcal {P}}'\). To the best of our knowledge, this is the first work to address online non-monotone DR-submodular maximization over a general convex set in the bandit setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2

Similar content being viewed by others

Data Availibility

This research has no associated data.

Notes

  1. The set \({\mathcal {P}}'\) is a \(\delta \)-interior of \({\mathcal {P}}\) if it holds for every \(x\in {\mathcal {P}}'\) that \(B(x,\delta )\subseteq {\mathcal {P}}\).

References

  1. Cover, T.M., Ordentlich, E.: Universal portfolios with side information. IEEE Trans. Inf. Theory 42(2), 348–363 (2006). https://doi.org/10.1109/18.485708

    Article  MathSciNet  Google Scholar 

  2. Foster, D.P., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav. 29(1–2), 7–35 (1999). https://doi.org/10.1006/game.1999.0740

    Article  MathSciNet  Google Scholar 

  3. Stoltz, G., Lugosi, G.: Internal regret in on-line portfolio selection. Mach. Learn. 59(1), 125–159 (2005). https://doi.org/10.1007/s10994-005-0465-4

    Article  Google Scholar 

  4. Chakrabarty, D., Ene, A., Krishnaswamy, R., Panigrahi, D.: Online buy-at-bulk network design. SIAM J. Comput. 47(4), 1505–1528 (2018). https://doi.org/10.1137/16M1117317

    Article  MathSciNet  Google Scholar 

  5. Nouruzi, A., Zakeri, A., Javan, M.R., Mokari, N., Hussain, R., Kazmi, S.M.A.: Online service provisioning in NFV-enabled networks using deep reinforcement learning. IEEE Trans. Netw. Service Manag. 19(3), 3276–3289 (2022). https://doi.org/10.1109/TNSM.2022.3159670

    Article  Google Scholar 

  6. Edmonds, J.: Submodular Functions, Matroids, and Certain Polyhedra, R. Guy, H. Hanani, N. Sauer, and J. Schönheim edn., pp. 69–87. Gordon and Breach, New York (1970)

  7. Chakrabarty, D., Goel, G., Vazirani, V.V., Wang, L., Yu, C.: Submodularity helps in Nash and nonsymmetric bargaining games. SIAM J. Discret Math. 28(1), 99–115 (2014). https://doi.org/10.1137/110821433

    Article  MathSciNet  Google Scholar 

  8. Malings, C., Pozzi, M.: Submodularity issues in value-of-information-based sensor placement. Reliab. Eng. Syst. Saf. 183, 93–103 (2019). https://doi.org/10.1016/j.ress.2018.11.010

    Article  Google Scholar 

  9. Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2019). https://doi.org/10.1007/s10489-012-0336-1

    Article  Google Scholar 

  10. Gao, C., Gu, S., Yu, J., Du, H., Wu, W.: Adaptive seeding for profit maximization in social networks. J. Glob. Optim. 82(2), 413–432 (2022). https://doi.org/10.1007/s10898-021-01076-1

    Article  MathSciNet  Google Scholar 

  11. Dam, T.T., Ta, T.A., Mai, T.: Robust maximum capture facility location under random utility maximization models. Eur. J. Oper. Res. 310(3), 1128–1150 (2023). https://doi.org/10.1016/j.ejor.2023.04.024

    Article  MathSciNet  Google Scholar 

  12. Bian, A.A., Mirzasoleiman, B., Buhmann, J., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 111–120 (2017). https://proceedings.mlr.press/v54/bian17a.html

  13. Bian, S., Guo, Q., Wang, S., Yu, J.X.: Efficient algorithms for budgeted influence maximization on massive social networks. In: Proceedings of the 46th International Conference on Very Large Data Bases, pp. 1498–1510 (2020). https://doi.org/10.14778/3397230.3397244

  14. Bian, Y., Buhmann, J., Krause, A.: Optimal continuous DR-submodular maximization and applications to provable mean field inference. In: Proceedings of the 36th International Conference on Machine Learning, pp. 644–653 (2019). https://proceedings.mlr.press/v97/bian19a.html

  15. Wu, H.-H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM J. Optim. 29(1), 690–718 (2019). https://doi.org/10.1137/17M1141576

  16. Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization. In: Proceedings of the 31st International Conference on Artificial Intelligence and Statistics, pp. 1896–1905 (2018). http://proceedings.mlr.press/v84/chen18f.html

  17. Th\({\acute{\breve{{\rm a}}}}\)ng, N.K., Srivastav, A.: Online non-monotone DR-submodular maximization. arXiv:1909.11426 (2019) https://arxiv.org/abs/1909.11426

  18. Hazan, E.: Introduction to online convex optimization. Foundations and Trends® in Optimization 2(3-4), 157–325 (2016). https://doi.org/10.1561/2400000013

  19. Bian, A., Levy, K.Y., Krause, A., Buhmann, J.M.: Continuous DR-submodular maximization: structure and algorithms. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 486–496 (2017). https://doi.org/10.5555/3294771.3294818

  20. Du, D., Liu, Z., Wu, C., Xu, D., Zhou, Y.: An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv:2203.14740 (2022)

  21. Streeter, M., Golovin, D.: An online algorithm for maximizing submodular functions. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1577–1584 (2008). https://doi.org/10.5555/2981780.2981977

  22. Zhang, M., Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization: from full-information to bandit feedback. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 9210–9221 (2019). https://doi.org/10.5555/3454287.3455113

  23. Mualem, L., Feldman, M.: Resolving the approximability of offline and online non-monotone DR-submodular maximization over general convex sets. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 2542–2564 (2023). https://proceedings.mlr.press/v206/mualem23a.html

  24. Zhu, J., Wu, Q., Zhang, M., Zheng, R., Li, K.: Projection-free decentralized online learning for submodular maximization over time-varying networks. J. Mach. Learn. Res. 22(1), 2328–2369 (2021). https://doi.org/10.5555/3546258.3546309

  25. Mokhtari, A., Hassani, H., Karbasi, A.: Decentralized submodular maximization: bridging discrete and continuous settings. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3616–3625 (2018). http://proceedings.mlr.press/v80/mokhtari18a.html

  26. Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005). https://doi.org/10.5555/1070432.1070486

  27. Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: closing the gap. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pp. 1886–1895 (2018). http://proceedings.mlr.press/v84/mokhtari18a.html

  28. Mokhari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: from convex minimization to submodular maximization. J. Mach. Learn. Res. 21(1), 4232–4280 (2020). https://doi.org/10.5555/3455716.3455821

  29. Zhang, Q., Deng, Z., Chen, Z., Zhou, K., Hu, H., Yang, Y.: Online learning for non-monotone submodular maximization: from full information to bandit feedback. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 3515–3537 (2023). https://proceedings.mlr.press/v206/zhang23f.html

  30. Nguyen, T.-A., Th\(\acute{\breve{{\rm a}}}\)ng, N.K., Trystram, D.: One gradient Frank-Wolfe for decentralized online convex and submodular optimization. In: Proceedings of the 14th Asian Conference on Machine Learning, pp. 802–815 (2023). https://proceedings.mlr.press/v189/nguyen23a.html

  31. Hassani, H., Soltanolkotabi, M., Karbasi, A.: Gradient methods for submodular maximization. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5843–5853 (2017). https://doi.org/10.5555/3295222.3295334

  32. Chekuri, C., Jayram, T.S., Vondrák, J.: On multiplicative weight updates for concave and submodular function maximization. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 201–210 (2015). https://doi.org/10.1145/2688073.2688086

  33. Shalev-Shwartz, S.: Online learning and online convex optimization. Foundations and Trends® in Machine Learning 4(2), 107–194 (2012). https://doi.org/10.1561/2200000018

  34. Chen, L., Zhang, M., Hassani, H., Karbasi, A.: Black box submodular maximization: discrete and continuous settings. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pp. 1058–1070 (2020). https://proceedings.mlr.press/v108/chen20c.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Wang.

Ethics declarations

Statements & Declarations

This research was partially supported by the National Natural Science Foundation of China (Nos. 12131003, 12271278), the Major Key Project of PCL (No. PCL2022A05) and the Talent Program of Guangdong Province (No. 2021QN02X160). The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3.5

Proof

It follows from the setting of \(d_q^k\) in Step 14 of Algorithm 1 that

$$\begin{aligned} \Vert \Delta _q^{k}\Vert ^2 =&\left\| \nabla \bar{F}_{q,k-1}(x_q^k)-d_q^k \right\| ^2 \\ =&\left\| \nabla \bar{F}_{q, k-1}(x_q^{k})-\left( 1-\rho ^k\right) d_q^{k-1}-\rho ^k {\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\right\| ^2 \\ =&\Big \Vert \rho ^k\Big (\nabla \bar{F}_{q, k-1}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\Big ) +\left( 1-\rho ^k\right) \left( \nabla \bar{F}_{q, k-2}(x_q^{k-1})\left. -d_q^{k-1}\right) \right. \\&+ \left( 1-\rho ^k\right) \left( \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla \bar{F}_{q, k-2}(x_q^{k-1})\right) \Big \Vert ^2 \end{aligned}$$

for any \(q\in [Q]\) and \(k =2,~3,~\ldots , K.\) For brevity, denote \(A_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\), \(B_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla \bar{F}_{q, k-2}(x_q^{k-1}) \), then

$$\begin{aligned}&\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2 \right] = \, (\rho ^k)^2 \textbf{E} \left[ \left\| A_q^k \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Delta _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| B_q^k \right\| ^2 \right] + 2 \rho ^k\\&\quad \cdot \left( 1-\rho ^k\right) \textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] + 2 \rho ^k \left( 1-\rho ^k\right) \textbf{E}\left[ \langle A_q^k, B_q^{k} \rangle \right] + 2 \left( 1-\rho ^k\right) ^2 \textbf{E}\left[ \langle \Delta _q^{k-1} , B_q^{k} \rangle \right] . \end{aligned}$$

Let us focus on the terms in R.H.S. of above equality separately. Recall that \({\mathcal {F}}_{q,k}= \sigma (t_{q,1},\ldots , t_{q,k})\).

For the first term \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \), it can be expressed as

$$\begin{aligned} \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] =&\textbf{E} \Big [ \Vert \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) + \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Vert ^2 \Big ] \\ =&\textbf{E} \Big [ \Big \Vert \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k})\Big \Vert ^2\Big ] + \textbf{E} \Big [ \Big \Vert \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \Vert ^2 \Big ] \\&+ 2\textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \Big ]. \end{aligned}$$

Due to (7), the first term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \) is upper bounded by

$$\begin{aligned}&\textbf{E} \left[ \left\| \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) \right\| ^2\right] \\&\quad = \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla F_{t_{q, k}}(x_q^{k})\right\| ^2 | {\mathcal {F}}_{q, k-1}\right] \right] \\&\quad = \textbf{E}\Big [ \textbf{E}\left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) \right\| ^2 | {\mathcal {F}}_{q, k-1} \right] - \left\| \textbf{E}\left[ \nabla F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k-1}\right] \right\| ^2 \Big ] \\&\quad \le \textbf{E} \left[ \left\| \nabla F_{t_{q, k}}\left( x_q^{k} \right) \right\| ^2 \right] \le L_0^2. \end{aligned}$$

Following Assumption 3.1, the second term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \) is upper bounded by

$$\begin{aligned} \textbf{E} \left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \right\| ^2\right] = \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \right\| ^2 | {\mathcal {F}}_{q, k }\right] \right] \le \sigma ^2. \end{aligned}$$

For the third term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \), it follows from the definition of \(\bar{F}_{q,k-1} \) and Assumption 3.1 that

$$\begin{aligned}&2 \textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \Big ]\\&\quad = 2 \textbf{E}\Big [\textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \big | \mathcal F_{q,k} \Big ] \Big ] \\&\quad = 2 \textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla F_{t_{q, k}}(x_q^{k}), \textbf{E}\Big [\nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k}\Big ]\Big \rangle \Big ] =0. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \le L_0^2 + \sigma ^2. \end{aligned}$$
(A1)

Secondly, we consider \( \textbf{E}\left[ \left\| B_q^{k} \right\| ^2 \right] \). Note that

$$\begin{aligned}&\textbf{E} \left[ \left\| B_q^{k} \right\| ^2 \right] \nonumber \\&\quad = \textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})}{K-k+1}-\frac{\sum \limits _{i=k-1}^K \nabla F_{t_{q, i}}(x_q^{k-1})}{K-k+2} \Bigg \Vert ^2 \Bigg ] \nonumber \\&\quad = \textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^K \left( \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1}) \right) }{K-k+2} -\frac{\nabla F_{t_{q, k-1}}(x_q^{k-1})}{K-k+2} \nonumber \\&\qquad +\frac{\sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})}{(K-k+1)(K-k+2)} \Bigg \Vert ^2 \Bigg ] \nonumber \\&\quad \le \textbf{E}\Bigg [ \Bigg ( \frac{ \sum \limits _{i=k}^K\left\| \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1}) \right\| }{K-k+2} + \frac{\left\| \nabla F_{t_{q, k-1}}(x_q^{k-1}) \right\| }{K-k+2} \nonumber \\&\qquad + \frac{\left\| \sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})\right\| }{(K-k+1)(K-k+2)} \Bigg ) ^ 2 \Bigg ] \nonumber \\&\quad \le \textbf{E}\Bigg [ \Bigg ( \frac{(K-k+1)L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K}}{K-k+2} + \frac{L_0}{K-k+2} + \frac{(K-k+1) L_0}{(K-k+1)(K-k+2)} \Bigg )^2 \Bigg ] \nonumber \\&\quad \le \left( \frac{L_1 {\textrm{d}({\mathcal {P}})}}{K-k+2} + \frac{2L_0}{K-k+2} \right) ^2 = \frac{G}{(K-k+2)^2}, \end{aligned}$$
(A2)

where the second inequality is due to the fact that

$$\begin{aligned} \Vert \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1})\Vert&\le L_1 \Vert x_q^{k} - x_q^{k-1} \Vert = L_1 \Big \Vert (v_q^{k-1}-x_q^{k-1})\frac{1}{K}\frac{\sqrt{a^{k-1}}}{a^{k}} \Big \Vert \\&\le L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K} . \end{aligned}$$

Then from Young’s Inequality, it implies

$$\begin{aligned} \textbf{E} \left[ \langle \Delta _q^{k-1} , B_q^{k} \rangle \right] \le \textbf{E} \Big [ \frac{1}{ \rho ^k}\left\| B_q^{k} \right\| ^2+\frac{\rho ^k}{4} \left\| \Delta _q^{k-1} \right\| ^2 \Big ] \le \frac{1}{\rho ^k} \frac{G}{(K-k+2)^2} +\frac{\rho ^k}{4} \textbf{E} \left[ \Vert \Delta _q^{k-1}\Vert ^2 \right] . \end{aligned}$$
(A3)

Thirdly, for \(\textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] \), we obtain from (7) and Assumption 3.1 that

$$\begin{aligned}&\textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] \nonumber \\&\quad = \textbf{E}\left[ \left\langle \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}), \nabla \bar{F}_{q, k-2}(x_q^{k-1})-d_q^{k-1}\right\rangle \right] \nonumber \\&\quad = \textbf{E}\Big [\textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}), \nabla \bar{F}_{q, k-2}(x_q^{k-1}) -d_q^{k-1}\Big \rangle | {\mathcal {F}}_{q, k-1} \Big ]\Big ] \nonumber \\&\quad = \textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})- \textbf{E}\left[ {\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k-1} \right] , \nabla \bar{F}_{q, k-2}(x_q^{k-1})-d_q^{k-1}\Big \rangle \Big ] = 0. \end{aligned}$$
(A4)

Similarly, we have \( \textbf{E}\left[ \langle A_q^k, B_q^{k} \rangle \right] =0\) which together with (A1)-(A3) yields

$$\begin{aligned} \textbf{E} \left[ \Vert \Delta _q^{k}\Vert ^2 \right] \le&(\rho ^k)^2 \left( L_0^2 + \sigma ^2\right) + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \Vert \Delta _q^{k-1} \Vert ^2 \right] + \left( 1-\rho ^k\right) ^2 \frac{G}{(K-k+2)^2} \\&+ \left( 1-\rho ^k\right) ^2 \Big ( \frac{2}{ \rho ^k }\frac{G}{(K-k+2)^2} + \frac{\rho ^k}{ 2} \textbf{E} \left[ \Vert \Delta _q^{k-1}\Vert ^2 \right] \Big ) . \end{aligned}$$

Hence, we derive (15).

For the case \(1 \le k \le \frac{K}{2}+1\), it holds from the setting of \(\rho ^k \) in (16) that

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{4 (L_0^2 + \sigma ^2) }{(k+3)^{4 / 3}} + {\frac{2 \left( \frac{(k+3)^{2/3}}{2}-1\right) G }{k^2} } \\ \le&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{4 (L_0^2 + \sigma ^2)+16G }{(k+3)^{4 / 3}}\\ =&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{N_0 }{(k+3)^{4 / 3}} , \end{aligned}$$

where the second inequality is due to

$$\begin{aligned}&\frac{G}{k^2}((k+3)^{2/3}-2 ) = \frac{G}{(k+3)^2} \frac{(k+3)^2}{k^2} ((k+3)^{2/3}-2 ) \\&\quad = \frac{G}{(k+3)^2} (1+\frac{3}{k})^2 ((k+3)^{2/3}-2 ) \le \frac{16 G}{(k+3)^2} ((k+3)^{2/3}-2 ) < \frac{16 G}{(k+3)^{4/3}}. \end{aligned}$$

We shall prove by induction that \(\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le \frac{N_1}{(k+4)^{2 / 3}}\) for \(1 \le k \le \frac{K}{2}+1\). We start with \(k=1\). Note that it follows from Assumption 3.1 and \(\rho ^1\le 1 \) that

$$\begin{aligned} \textbf{E} \left[ \Vert \Delta _q^{1}\Vert ^2\right] =&\textbf{E}\left[ \left\| \nabla \bar{F}_{q, 0}(x_q^{1})-d_q^{1}\right\| ^2 \right] = \textbf{E}\left[ \left\| \frac{\sum _{i=1}^K \nabla F_{t_{q, i}}(x_q^{1})}{K} - \rho ^1 {\tilde{\nabla }} F_{t_{q, 1}}(x_q^{1}) \right\| ^2 \right] \\ \le&(L_0+M)^2 \le \frac{N_1}{5^{2 / 3}}. \end{aligned}$$

Suppose that the argument is valid for \(k-1\). Then we have

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+3)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{N_0}{(k+3)^{4 / 3}} \\ \le&\left( 1-\frac{2}{(k+3)^{2 / 3}}\right) \frac{N_1}{(k+4-1)^{2 / 3}}+\frac{N_0}{(k+3)^{4 / 3}} \\ =&\frac{N_1 (k+3)^{2 / 3}-2 N_1}{(k+3)^{4 / 3}} +\frac{N_0}{(k+3)^{4 / 3}} \\ \le&\frac{N_1 (- 1+(k+3)^{2 / 3}) }{(k+3)^{4 / 3}} \le \frac{N_1 }{(k+4)^{2 / 3}}, \end{aligned}$$

where the last inequality is due to

$$\begin{aligned} (k+4)^{2 / 3} ((k+3)^{2 / 3}-1) \le ((k+3)^{2 / 3}+1) \left( (k+3)^{2 / 3}-1\right) \le (k+3)^{4 / 3}. \end{aligned}$$

Let us now move to the other case \(\frac{K}{2}+2 \le k \le K\). Observe that \(\rho ^k=\frac{1.5}{(K-k+2)^{2 / 3}}\), and \( \rho ^k \le \frac{1.5}{2^{2/3}} \le 1\). Then we obtain

$$\begin{aligned}&\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \\&\quad \le \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{2.25 (L_0^2 + \sigma ^2)}{(K-k+2)^{4 / 3}} +\frac{G \left( \frac{2 (K-k+2)^{2 / 3}}{1.5}-2\right) }{(K-k+2)^2} \\&\quad \le \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{2.25 (L_0^2 + \sigma ^2) }{(K-k+2)^{4 / 3}} +\frac{4 G}{3(K-k+2)^\frac{4}{3}} \\&\quad = \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] +\frac{N_2 }{(K-k+2)^{4 / 3}}. \end{aligned}$$

We shall prove by induction that \(\textbf{E}\left[ \Vert \Delta _q^{k}\Vert \right] \le \frac{N_3}{(K-k+1)^{2 / 3}}\) for \(\frac{K}{2}+1 \le k \le K\). When \(k=\frac{K}{2}+1\), based on previous results, it holds that

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{ K / 2+1 }\Vert ^2 \right] \le \frac{N_1}{(K / 2+1+4)^{2 / 3}} \le \frac{N_1}{(K / 2)^{2 / 3}}. \end{aligned}$$

Suppose that the statement is valid for \(k-1\), where \(\frac{K}{2}+1 \le k-1 \le K\). Then we can derive

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert \right] \le&\left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \frac{N_3}{(K-k+2)^{2 / 3}} +\frac{N_2 }{(K-k+2)^{4 / 3}} \\ \le&\left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \frac{N_3}{(K-k+2)^{2 / 3}} + \frac{N_3 }{(K-k+2)^{4 / 3}}\\ =&\frac{ N_3 (-0.5 +(K-k+2)^{2 / 3} )}{(K-k+2)^{4 / 3}} {\le } \frac{ N_3}{(K-k+1)^{2 / 3}}, \end{aligned}$$

where the third inequality holds because

$$\begin{aligned}&(-0.5 +(K-k+2)^{2 / 3} )(K-k+1)^{2/ 3} <(K-k+2)^{4/ 3} . \end{aligned}$$

The proof is completed. \(\square \)

Appendix B: Proof of Lemma 4.2

Proof

Since \({\tilde{F}}_{q,k-1}\) is nonnegative, \(L_1\)-smooth and DR-submodular, it follows from (2), (3) and (20) that

$$\begin{aligned}&\langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), x_{\delta }^*-x_q^k\rangle \nonumber \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^k \vee x_{\delta }^*)+{\tilde{F}}_{q,k-1}(x_q^k \wedge x_{\delta }^*)-2 {\tilde{F}}_{q,k-1}(x_q^k) \nonumber \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^k \vee x_{\delta }^*)+0-2 {\tilde{F}}_{q,k-1}(x_q^k) \nonumber \\&\quad \ge \frac{ 1- \Vert z\Vert _{\infty } }{\sqrt{a^{k}}} {\tilde{F}}_{q,k-1}(x_{\delta }^*)-2 {\tilde{F}}_{q,k-1}(x_q^k) \end{aligned}$$
(B5)

for any \(q\in [Q]\), \(k\in [K+1]\). Then it derives from Step 5 of Algorithm 2, the definition of \( {\textrm{d}({\mathcal {P}})}\), (B5), the setting of \(a^{k}\), and Young’s Inequality that

$$\begin{aligned}&a^{k+1} {\tilde{F}}_{q,k-1}(x_q^{k+1})-\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) - a^{k} {\tilde{F}}_{q,k-1}(x_q^{k}) \\&\quad = a^{k+1}\left( {\tilde{F}}_{q,k-1}(x_q^{k+1})- {\tilde{F}}_{q,k-1}(x_q^{k}) \right) +(a^{k+1}-a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k}) \\&\qquad -\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) \\&\quad = a^{k+1} \left( {\tilde{F}}_{q,k-1}\Big (x_q^{k}+(v_q^k-x_q^k) \frac{1}{K}\frac{\sqrt{ a^{k}} }{a^{k+1}}\Big )-{\tilde{F}}_{q,k-1}(x_q^{k})\right) \\&\qquad +(a^{k+1} -a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k} ) -\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) \\&\quad \ge a^{k+1} \Bigg ( \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), (v_q^k-x_q^k) \frac{1}{K} \frac{\sqrt{ a^{k}} }{a^{k+1}}\Big \rangle -\frac{L_1}{2}\frac{1}{K^2} \frac{a^{k}}{(a^{k+1})^2} \Vert v_q^k-x_q^k\Vert ^2\Bigg ) \\&\qquad +(a^{k+1}-a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k}) -\frac{\sqrt{a^{k}}}{K} \left( \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), x_{\delta }^*-x_q^k\Big \rangle + 2{\tilde{F}}_{q,k-1}(x_q^k) \right) \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^{k}) \Big ((a^{k+1}-a^{k})-\frac{2\sqrt{a^{k}}}{K}\Big ) + \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K} \Big \rangle -\frac{L_1}{2}\frac{1}{K^2} {\textrm{d}}^2({\mathcal {P}}) \\&\quad = \frac{1}{K^2} {\tilde{F}}_{q,k-1}(x_q^{k}) + \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k, (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K}\Big \rangle + \Big \langle d_q^k, (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K} \Big \rangle \\&\qquad -\frac{L_1 {\textrm{d}}^2({\mathcal {P}})}{2 K^2} \\&\quad \ge -\frac{\sqrt{a^{k}}}{K} \frac{1}{2 \beta ^k} || \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k ||^2 -\frac{\sqrt{a^{k}}}{K} \frac{ \beta ^k}{2 } {\textrm{d}}^2({\mathcal {P}}) - \frac{\sqrt{a^{k}}}{K} \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle -\frac{L_1 {\textrm{d}}^2({\mathcal {P}})}{2 K^2} \end{aligned}$$

for any \(q\in [Q]\), \(k\in [K]\). \(\square \)

Appendix C: Proof of Lemma 4.4

Proof

Following from \( x_q^1:=z\), \(x_q^{K+1}:=x_q\), \(a^{K+1}=4\) and \(a^1=1\), taking total expectation on both sides of (23), applying Lemma 4.1 (i), and summing up this inequality over all \(k\in [K]\) yields

$$\begin{aligned} \begin{aligned} \textbf{E} \left[ -{\tilde{F}}_{q} (x_q) + \frac{1}{4}(1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q}(x_{\delta }^*) + \frac{1}{4} {\tilde{F}}_{q }(z) \right] \le \frac{L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } + \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{K} \frac{ \beta ^k}{8 } {\textrm{d}}^2({\mathcal {P}}) \\ + \sum \limits _{k=1}^{K} \frac{\sqrt{a^{k}}}{K} \frac{1}{8 \beta ^k} \textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k \Vert ^2 \right] + \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{4 K} \textbf{E} \left[ \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \right] . \end{aligned} \end{aligned}$$

In consideration of \({\tilde{F}}_{q }(z)\ge 0,\) we sum the inequality over q from 1 to Q, then obtain

$$\begin{aligned} \begin{aligned}&\textbf{E} \left[ \sum \limits _{q=1}^{Q} \frac{ 1-\Vert z\Vert _{\infty } }{4} {\tilde{F}}_{q}(x_{\delta }^*) - \sum \limits _{q=1}^{Q} {\tilde{F}}_{q} (x_q) \right] \le \frac{Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } +\sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}} \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 8 K} \\&\quad + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K} \frac{\sqrt{a^{k}}}{K} \frac{1}{8 \beta ^k} \textbf{E} \left[ || \nabla {\tilde{F}}_{q,k-1} (x_q^k)-d_q^k ||^2 \right] + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{4 K} \textbf{E} \left[ \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \right] . \end{aligned} \end{aligned}$$

Since \(x_{\delta }^*\in {\mathcal {P}}' \), we can apply Lemma 4.3 to yield

$$\begin{aligned} \sum \limits _{q=1}^{Q} \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \le \frac{ 2 n\bar{M}}{\delta }\sqrt{\frac{2 Q D_R}{\mu }}, ~~k \in [K]. \end{aligned}$$

Therefore, by \({a^k}\le 4\), we deduce

$$\begin{aligned} \begin{aligned} \textbf{E} \left[ \sum \limits _{q=1}^{Q} \frac{ 1-\Vert z\Vert _{\infty } }{4} {\tilde{F}}_{q}(x_{\delta }^*) - \sum \limits _{q=1}^{Q} {\tilde{F}}_{q} (x_q) \right] \le \frac{Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } + \sum \limits _{k=1}^{K}\frac{ \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K}\\ + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K} \frac{\textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1} (x_q^k)-d_q^k \Vert ^2 \right] }{4 \beta ^k K} + \frac{ n\bar{M}}{\delta }\sqrt{\frac{2 Q D_R}{\mu }}. \end{aligned} \end{aligned}$$

It indicates (24) from (22), the definition of \(\tilde{F}_{q }\) in (19), and \(T=QL\). \(\square \)

Appendix D: Proof of Lemma 4.5

Proof

Note that from the setting of \(d_q^k\) in Step 18 of Algorithm 2,

$$\begin{aligned} \Vert \Lambda _q^{k}\Vert ^2 =&\left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-\left( 1-\rho ^k\right) d_q^{k-1}-\rho ^k g_q^{k}\right\| ^2 \\ =&\Big \Vert \rho ^k\left( \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-g_q^{k}\right) +\left( 1-\rho ^k\right) \Big (\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1}) -d_q^{k-1}\Big ) \\&+\left( 1-\rho ^k\right) \left( \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1})\right) \Big \Vert ^2 ~~ \end{aligned}$$

for any \(q\in [Q]\) and \(k=2,\ldots ,K.\) For convenience, we denote \(C_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-g_q^{k}\), \(D_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1}) \). Then we express \([ \Vert \Lambda _q^{k}\Vert ^2 ]\) as

$$\begin{aligned}&\textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2 \right] = (\rho ^k)^2 \textbf{E} \left[ \left\| C_q^k \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Lambda _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| D_q^k \right\| ^2 \right] + 2 \rho ^k \nonumber \\&\quad \cdot \left( 1-\rho ^k\right) \textbf{E}\left[ \langle C_q^k, \Lambda _q^{k-1} \rangle \right] + 2 \rho ^k \left( 1-\rho ^k\right) \textbf{E}\left[ \langle C_q^k, D_q^{k} \rangle \right] + 2 \left( 1-\rho ^k\right) ^2 \textbf{E}\left[ \langle \Lambda _q^{k-1} , D_q^{k} \rangle \right] . \end{aligned}$$
(D6)

Next we handle these terms respectively. For the first term of (D6), it holds from Lemma 4.1 (ii) and \(L_0\)-Lipschitz continuity of \( \hat{F_t}\) that

$$\begin{aligned} \textbf{E} \left[ \left\| C_q^k \right\| ^2\right] =&\textbf{E} \left[ \left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k})+ \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2\right] \nonumber \\ =&\textbf{E} \left[ \left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}) \right\| ^2\right] +\textbf{E} \Big [ \Big \Vert \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \Big \Vert ^2\Big ] \nonumber \\&+ 2\textbf{E} \left[ \left\langle \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}) , \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\rangle \right] \nonumber \\ \le&L_0^2 + \textbf{E} \left[ \left\| \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2\right] \nonumber \\ \le&L_0^2 + \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2 | {\mathcal {F}}_{q, k }\right] \right] \nonumber \\ =&L_0^2 + \textbf{E}\left[ \textbf{E}\left[ \left\| g_q^{k} \right\| ^ 2 | {\mathcal {F}}_{q, k } \right] - \left\| \textbf{E}\left[ g_q^{k} | {\mathcal {F}}_{q, k }\right] \right\| ^ 2 \right] \nonumber \\ \le&L_0^2+ \textbf{E}\left[ \left\| \frac{n}{\delta } F_{t_{q,k}}(x_{t_{q,k}}+\delta u_q^k ) u_q^k \right\| ^2 \right] \le L_0^2 + \left( \frac{n \bar{M}}{\delta }\right) ^2 , \end{aligned}$$
(D7)

where we use the fact that \( 2 \textbf{E} [ \langle \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}), \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \rangle ]=0 \) by Remark 4.2. For the third term of (D6), it follows from \(L_0\)-Lipschitz continuity and \(L_1\)-smoothness of \(\hat{F}_t\) that

$$\begin{aligned} \textbf{E} \left[ \left\| D_q^{k} \right\| ^2 \right] = \,&\textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})}{L-k+1}-\frac{\sum \limits _{i=k-1}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k-1})}{L-k+2} \Bigg \Vert ^2 \Bigg ] \nonumber \\ = \,&\textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^L \left( \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1}) \right) -\nabla \hat{F}_{t_{q, k-1}}(x_q^{k-1})}{L-k+2} \nonumber \\&+\frac{\sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})}{(L-k+1)(L-k+2)} \Bigg \Vert ^2 \Bigg ] \nonumber \\ \le&\textbf{E}\Bigg [ \Bigg ( \frac{ \sum \limits _{i=k}^L\left\| \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1}) \right\| + \left\| \nabla \hat{F}_{t_{q, k-1}}(x_q^{k-1}) \right\| }{L-k+2} \nonumber \\&+ \frac{\left\| \sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})\right\| }{(L-k+1)(L-k+2)} \Bigg ) ^ 2 \Bigg ] \nonumber \\ \le&\textbf{E}\Bigg [ \Bigg ( \frac{(L-k+1)L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K}}{L-k+2} + \frac{L_0}{L-k+2} + \frac{(L-k+1) L_0}{(L-k+1)(L-k+2)} \Bigg )^2 \Bigg ] \nonumber \\ = \,&\left( \frac{L-k+1}{L-k+2} \frac{L_1 {\textrm{d}({\mathcal {P}})}}{K} + \frac{2L_0}{L-k+2} \right) ^2 \nonumber \\ \le&\left( \frac{3L_1 {\textrm{d}({\mathcal {P}})}}{ k+2 } + \frac{2L_0}{ k+2 } \right) ^2 =\, \frac{H}{(k+2)^2} , \end{aligned}$$
(D8)

where the second inequality is due to \(L\ge 2K\) and

$$\begin{aligned} \Vert \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1})\Vert \le \frac{L_1}{K} \Big \Vert (v_q^{k-1}-x_q^{k-1})\frac{\sqrt{a^{k-1}}}{a^{k}} \Big \Vert \le \frac{{L_1\textrm{d}({\mathcal {P}})} }{K} . \end{aligned}$$

For the fourth term and the fifth term of (D6), in the light of the definition of \(\tilde{F}_{q,k-1} \) and Lemma 4.1 (iii), we can mimic the proof of (A4) and derive

$$\begin{aligned} \textbf{E}\left[ \langle C_q^k, \Lambda _q^{k-1} \rangle \right] =\,0 \quad \text{ and }\quad \textbf{E}\left[ \langle C_q^k, D_q^{k} \rangle \right] =\,0. \end{aligned}$$
(D9)

For the sixth term of (D6), it yields from Young’s Inequality that

$$\begin{aligned} \textbf{E} \left[ \langle \Lambda _q^{k-1} , D_q^{k} \rangle \right] \le \textbf{E} \left[ \frac{1}{ \rho ^k}\left\| D_q^{k} \right\| ^2+\frac{\rho ^k}{4} \left\| \Lambda _q^{k-1} \right\| ^2 \right] \le \frac{1}{ \rho ^k} \frac{H}{( k+2)^2} +\frac{\rho ^k}{4} \textbf{E} \left[ \Vert \Lambda _q^{k-1}\Vert ^2 \right] . \end{aligned}$$
(D10)

Above all, (D7)-(D10) jointly guarantee

$$\begin{aligned} \textbf{E} \left[ \Vert \Lambda _q^{k}\Vert ^2 \right] \le&(\rho ^k)^2 \left( L_0^2 + \left( \frac{n \bar{M}}{\delta } \right) ^2\right) + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Lambda _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \frac{H}{( k+2)^2} \\&+ \left( 1-\rho ^k\right) ^2\left( \frac{2}{ \rho ^k }\frac{H}{( k+2)^2} + \frac{\rho ^k}{ 2} \textbf{E} \left[ \Vert \Lambda _q^{k-1}\Vert ^2 \right] \right) \\ \le&(\rho ^k)^2 \overline{\sigma } +\frac{2 \left( \frac{1}{\rho ^k}-1\right) H }{( k+2)^2} + \left( 1-\rho ^k\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] , \end{aligned}$$

where we use the fact that \(\left( 1-\rho ^k\right) ^2 (1+\frac{2}{ \rho ^k })\le \frac{2}{\rho ^k}-2 \) and \(\left( 1-\rho ^k\right) ^2 (1+\frac{\rho ^k}{ 2 })\le 1-\rho ^k \).

If \( \rho ^k=\,\frac{2}{(k+2)^{2/3}}\), then \( \rho ^k \le \frac{2}{ 2.08 } \le 1 \). When \(k\ge 2\), it follows from (25) that

$$\begin{aligned} \textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] +\frac{4 \overline{\sigma }}{(k+2)^{4/3}} +\frac{\left( (k+2)^{2/3}-2\right) H }{( k+2)^2} \nonumber \\ \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] + \frac{4\overline{\sigma }+ H }{(k+2)^{4/3}} -0 \nonumber \\ \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] + \frac{N_4}{(k+2)^{4/3}}. \end{aligned}$$
(D11)

Next we prove (26) by induction on k. When \(k=1\), it follows from \(L_0\)-Lipschitz continuity of \(F_t \) and the definition of \(\bar{\sigma }\) in Lemma 4.5 that

$$\begin{aligned} \textbf{E} \left[ \Vert \Lambda _q^{1}\Vert ^2\right] =\,&\textbf{E}\left[ \left\| \nabla {\tilde{F}}_{q, 0}(z)-d_q^{1}\right\| ^2 \right] \\ =\,&\textbf{E}\left[ \left\| \frac{\sum _{i=1}^L \nabla \hat{F}_{t_{q, i}}(z)}{L} - (1-\rho ^1)d_q^0- \rho ^1 g_q^1 \right\| ^2 \right] \\ \le&\textbf{E}\Big [ \Big ( \frac{1}{L} \Big \Vert \sum _{i=1}^L \nabla \hat{F}_{t_{q, i}}(z) \Big \Vert + \frac{2}{3^{2/3} } \Big \Vert \frac{n}{\delta } F_{t_{q,1}}(x_{t_{q,1}}+\delta u_q^1 ) u_q^1 \Big \Vert \Big )^2 \Big ] \\ \le&\left( L_0+\frac{n \bar{M}}{\delta }\right) ^2 \le 2 L_0^2 +2 \left( \frac{n \bar{M}}{\delta } \right) ^2 = 2 \overline{\sigma } \le \frac{N_4}{(1+3)^{2 / 3}}. \end{aligned}$$

We now assume \( \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] \le \frac{N_4}{(k+2)^{2 / 3}}. \) Then we can obtain from (D11) and

$$\begin{aligned} (k+3)^{2 / 3}\left( (k+2)^{2 / 3}-1\right) \le \left( (k+2)^{2 / 3}+1\right) \left( (k+2)^{2 / 3}-1\right) <(k+2)^{4 / 3} \end{aligned}$$

that

$$\begin{aligned} \textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] +\frac{N_4}{(k+2)^{4/3}} \\ \le&\left( 1-\frac{2}{(k+2)^{2 / 3}}\right) \frac{N_4}{(k-1+3)^{2 / 3}} + \frac{N_4}{(k+2)^{4 / 3}} \\ \le&\frac{N_4 (- 1+(k+2)^{2 / 3}) }{(k+2)^{4 / 3}} \le \frac{N_4 }{(k+3)^{2 / 3}}. \end{aligned}$$

\(\square \)

Appendix E: Proof of Theorem 4.1

Proof

By Lemma 4.2 with \( \beta ^{k} = \frac{1}{\delta (k+3)^{1 / 3} }\), for the fourth term of the R.H.S. of (24), it yields from Lemma 4.5 that

$$\begin{aligned}&\sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K } \frac{L \textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k \Vert ^2 \right] }{4 \beta ^k K} \\&\quad \le \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K } \frac{ L\delta (k+3)^{1/3}}{4 K} \frac{N_4 }{(k+3)^{2 / 3}} { \le \frac{ Q L \delta N_4 }{4 K} \int _0^{K } \frac{1 }{ (x+3) ^{1 / 3}} d x}\\&\quad \le \frac{ 3 T \delta N_4 }{8 K^{1/3}} =\, \frac{ 3 \big (L_0^2 + \left( 3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0\right) ^2\big ) }{ 2^{2/3} } \frac{T \delta }{ K^{1/3} } + \frac{ 3 n^2 \bar{M}^2}{ 2^{2/3}} \frac{T }{ K^{1/3} \delta } \\&\quad \le 2 \big (L_0^2 + \left( 3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0\right) ^2\big ) \frac{T \delta }{ K^{1/3} } + 2 n^2 \bar{M}^2 \frac{T }{ K^{1/3} \delta }. \end{aligned}$$

For the third term of the R.H.S. of (24), we have

$$\begin{aligned}&\sum \limits _{k=1}^{K}\frac{L \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 4K } =\, \sum \limits _{k=1}^{K }\frac{ L Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K \delta (k+3)^{1/3}} \\&\quad \le \frac{ L Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K \delta } \int _0^{K } \frac{1 }{(x+3)^{1 / 3}} d x \le \frac{ 3 T {\textrm{d}^2({\mathcal {P}}}) }{ 8 \delta K^{1/3} } . \end{aligned}$$

Hence, it implies

$$\begin{aligned} \begin{aligned}&\textbf{E}\left[ \textrm{Regret}\left( \frac{1- \Vert z\Vert _{\infty } }{4}, T\right) \right] \le \frac{1- \Vert z\Vert _{\infty } }{4} T L_0 c_1 \delta ^\gamma + \frac{L Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K }\\&\quad + \frac{ 2 n^2 (\bar{M})^2 T }{ K^{1/3} \delta } + \frac{n\bar{M} L}{\delta } \\&\cdot \sqrt{\frac{2 D_R Q}{\mu }}+ QK \bar{M} + {\frac{ \big (2L_0^2 + 2(3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0)^2\big )T \delta }{ K^{1/3}}}\\&\quad + \frac{ 3 T {\textrm{d}^2({\mathcal {P}}}) }{ 8 \delta K^{1/3}} + \frac{ 5- \Vert z\Vert _{\infty } }{4} T L_0 \delta , \end{aligned} \end{aligned}$$

which by the parameter settings derives the conclusion. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, J., Wang, X. & Xu, D. Online non-monotone diminishing return submodular maximization in the bandit setting. J Glob Optim 90, 619–649 (2024). https://doi.org/10.1007/s10898-024-01413-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-024-01413-0

Keywords

Mathematics Subject Classification

Navigation