Online non-monotone diminishing return submodular maximization in the bandit setting

135 Accesses
Explore all metrics

Abstract

In this paper, we study online diminishing return submodular (DR-submodular for short) maximization in the bandit setting. Our focus is on problems where the reward functions can be non-monotone, and the constraint set is a general convex set. We first present the Single-sampling Non-monotone Frank-Wolfe algorithm. This algorithm only requires a single call to each reward function, and it computes the stochastic gradient to make it suitable for large-scale settings where full gradient information might not be available. We provide an analysis of the approximation ratio and regret bound of the proposed algorithm. We then propose the Bandit Online Non-monotone Frank-Wolfe algorithm to adjust for problems in the bandit setting, where each reward function returns the function value at a single point. We take advantage of smoothing approximations to reward functions to tackle the challenges posed by the bandit setting. Under mild assumptions, our proposed algorithm can reach $\frac{1}{4} (1- \min _{x\in {\mathcal {P}}'}\Vert x\Vert _\infty )$-approximation with regret bounded by $O (T^{\frac{5 \min \{1, \gamma \}+5 }{6 \min \{1, \gamma \}+5}})$, where the positive parameter $\gamma $ is related to the “safety domain” ${\mathcal {P}}'$. To the best of our knowledge, this is the first work to address online non-monotone DR-submodular maximization over a general convex set in the bandit setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online Non-monotone DR-Submodular Maximization: 1/4 Approximation Ratio and Sublinear Regret

Data Availibility

This research has no associated data.

Notes

The set ${\mathcal {P}}'$ is a $\delta $-interior of ${\mathcal {P}}$ if it holds for every $x\in {\mathcal {P}}'$ that $B(x,\delta )\subseteq {\mathcal {P}}$.

References

Cover, T.M., Ordentlich, E.: Universal portfolios with side information. IEEE Trans. Inf. Theory 42(2), 348–363 (2006). https://doi.org/10.1109/18.485708
Article MathSciNet Google Scholar
Foster, D.P., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav. 29(1–2), 7–35 (1999). https://doi.org/10.1006/game.1999.0740
Article MathSciNet Google Scholar
Stoltz, G., Lugosi, G.: Internal regret in on-line portfolio selection. Mach. Learn. 59(1), 125–159 (2005). https://doi.org/10.1007/s10994-005-0465-4
Article Google Scholar
Chakrabarty, D., Ene, A., Krishnaswamy, R., Panigrahi, D.: Online buy-at-bulk network design. SIAM J. Comput. 47(4), 1505–1528 (2018). https://doi.org/10.1137/16M1117317
Article MathSciNet Google Scholar
Nouruzi, A., Zakeri, A., Javan, M.R., Mokari, N., Hussain, R., Kazmi, S.M.A.: Online service provisioning in NFV-enabled networks using deep reinforcement learning. IEEE Trans. Netw. Service Manag. 19(3), 3276–3289 (2022). https://doi.org/10.1109/TNSM.2022.3159670
Article Google Scholar
Edmonds, J.: Submodular Functions, Matroids, and Certain Polyhedra, R. Guy, H. Hanani, N. Sauer, and J. Schönheim edn., pp. 69–87. Gordon and Breach, New York (1970)
Chakrabarty, D., Goel, G., Vazirani, V.V., Wang, L., Yu, C.: Submodularity helps in Nash and nonsymmetric bargaining games. SIAM J. Discret Math. 28(1), 99–115 (2014). https://doi.org/10.1137/110821433
Article MathSciNet Google Scholar
Malings, C., Pozzi, M.: Submodularity issues in value-of-information-based sensor placement. Reliab. Eng. Syst. Saf. 183, 93–103 (2019). https://doi.org/10.1016/j.ress.2018.11.010
Article Google Scholar
Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2019). https://doi.org/10.1007/s10489-012-0336-1
Article Google Scholar
Gao, C., Gu, S., Yu, J., Du, H., Wu, W.: Adaptive seeding for profit maximization in social networks. J. Glob. Optim. 82(2), 413–432 (2022). https://doi.org/10.1007/s10898-021-01076-1
Article MathSciNet Google Scholar
Dam, T.T., Ta, T.A., Mai, T.: Robust maximum capture facility location under random utility maximization models. Eur. J. Oper. Res. 310(3), 1128–1150 (2023). https://doi.org/10.1016/j.ejor.2023.04.024
Article MathSciNet Google Scholar
Bian, A.A., Mirzasoleiman, B., Buhmann, J., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 111–120 (2017). https://proceedings.mlr.press/v54/bian17a.html
Bian, S., Guo, Q., Wang, S., Yu, J.X.: Efficient algorithms for budgeted influence maximization on massive social networks. In: Proceedings of the 46th International Conference on Very Large Data Bases, pp. 1498–1510 (2020). https://doi.org/10.14778/3397230.3397244
Bian, Y., Buhmann, J., Krause, A.: Optimal continuous DR-submodular maximization and applications to provable mean field inference. In: Proceedings of the 36th International Conference on Machine Learning, pp. 644–653 (2019). https://proceedings.mlr.press/v97/bian19a.html
Wu, H.-H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM J. Optim. 29(1), 690–718 (2019). https://doi.org/10.1137/17M1141576
Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization. In: Proceedings of the 31st International Conference on Artificial Intelligence and Statistics, pp. 1896–1905 (2018). http://proceedings.mlr.press/v84/chen18f.html
Th${\acute{\breve{{\rm a}}}}$ng, N.K., Srivastav, A.: Online non-monotone DR-submodular maximization. arXiv:1909.11426 (2019) https://arxiv.org/abs/1909.11426
Hazan, E.: Introduction to online convex optimization. Foundations and Trends® in Optimization 2(3-4), 157–325 (2016). https://doi.org/10.1561/2400000013
Bian, A., Levy, K.Y., Krause, A., Buhmann, J.M.: Continuous DR-submodular maximization: structure and algorithms. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 486–496 (2017). https://doi.org/10.5555/3294771.3294818
Du, D., Liu, Z., Wu, C., Xu, D., Zhou, Y.: An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv:2203.14740 (2022)
Streeter, M., Golovin, D.: An online algorithm for maximizing submodular functions. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1577–1584 (2008). https://doi.org/10.5555/2981780.2981977
Zhang, M., Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization: from full-information to bandit feedback. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 9210–9221 (2019). https://doi.org/10.5555/3454287.3455113
Mualem, L., Feldman, M.: Resolving the approximability of offline and online non-monotone DR-submodular maximization over general convex sets. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 2542–2564 (2023). https://proceedings.mlr.press/v206/mualem23a.html
Zhu, J., Wu, Q., Zhang, M., Zheng, R., Li, K.: Projection-free decentralized online learning for submodular maximization over time-varying networks. J. Mach. Learn. Res. 22(1), 2328–2369 (2021). https://doi.org/10.5555/3546258.3546309
Mokhtari, A., Hassani, H., Karbasi, A.: Decentralized submodular maximization: bridging discrete and continuous settings. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3616–3625 (2018). http://proceedings.mlr.press/v80/mokhtari18a.html
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005). https://doi.org/10.5555/1070432.1070486
Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: closing the gap. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pp. 1886–1895 (2018). http://proceedings.mlr.press/v84/mokhtari18a.html
Mokhari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: from convex minimization to submodular maximization. J. Mach. Learn. Res. 21(1), 4232–4280 (2020). https://doi.org/10.5555/3455716.3455821
Zhang, Q., Deng, Z., Chen, Z., Zhou, K., Hu, H., Yang, Y.: Online learning for non-monotone submodular maximization: from full information to bandit feedback. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 3515–3537 (2023). https://proceedings.mlr.press/v206/zhang23f.html
Nguyen, T.-A., Th$\acute{\breve{{\rm a}}}$ng, N.K., Trystram, D.: One gradient Frank-Wolfe for decentralized online convex and submodular optimization. In: Proceedings of the 14th Asian Conference on Machine Learning, pp. 802–815 (2023). https://proceedings.mlr.press/v189/nguyen23a.html
Hassani, H., Soltanolkotabi, M., Karbasi, A.: Gradient methods for submodular maximization. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5843–5853 (2017). https://doi.org/10.5555/3295222.3295334
Chekuri, C., Jayram, T.S., Vondrák, J.: On multiplicative weight updates for concave and submodular function maximization. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 201–210 (2015). https://doi.org/10.1145/2688073.2688086
Shalev-Shwartz, S.: Online learning and online convex optimization. Foundations and Trends® in Machine Learning 4(2), 107–194 (2012). https://doi.org/10.1561/2200000018
Chen, L., Zhang, M., Hassani, H., Karbasi, A.: Black box submodular maximization: discrete and continuous settings. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pp. 1058–1070 (2020). https://proceedings.mlr.press/v108/chen20c.html

Download references

Author information

Authors and Affiliations

Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, 100124, China
Jiachen Ju & Dachuan Xu
Peng Cheng Laboratory, Shenzhen, 518066, China
Jiachen Ju & Xiao Wang

Authors

Jiachen Ju
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dachuan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Wang.

Ethics declarations

Statements & Declarations

This research was partially supported by the National Natural Science Foundation of China (Nos. 12131003, 12271278), the Major Key Project of PCL (No. PCL2022A05) and the Talent Program of Guangdong Province (No. 2021QN02X160). The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3.5

Proof

It follows from the setting of $d_q^k$ in Step 14 of Algorithm 1 that

$$\begin{aligned} \Vert \Delta _q^{k}\Vert ^2 =&\left\| \nabla \bar{F}_{q,k-1}(x_q^k)-d_q^k \right\| ^2 \\ =&\left\| \nabla \bar{F}_{q, k-1}(x_q^{k})-\left( 1-\rho ^k\right) d_q^{k-1}-\rho ^k {\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\right\| ^2 \\ =&\Big \Vert \rho ^k\Big (\nabla \bar{F}_{q, k-1}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\Big ) +\left( 1-\rho ^k\right) \left( \nabla \bar{F}_{q, k-2}(x_q^{k-1})\left. -d_q^{k-1}\right) \right. \\&+ \left( 1-\rho ^k\right) \left( \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla \bar{F}_{q, k-2}(x_q^{k-1})\right) \Big \Vert ^2 \end{aligned}$$

for any $q\in [Q]$ and $k =2,~3,~\ldots , K.$ For brevity, denote $A_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})$, $B_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla \bar{F}_{q, k-2}(x_q^{k-1}) $, then

$$\begin{aligned}&\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2 \right] = \, (\rho ^k)^2 \textbf{E} \left[ \left\| A_q^k \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Delta _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| B_q^k \right\| ^2 \right] + 2 \rho ^k\\&\quad \cdot \left( 1-\rho ^k\right) \textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] + 2 \rho ^k \left( 1-\rho ^k\right) \textbf{E}\left[ \langle A_q^k, B_q^{k} \rangle \right] + 2 \left( 1-\rho ^k\right) ^2 \textbf{E}\left[ \langle \Delta _q^{k-1} , B_q^{k} \rangle \right] . \end{aligned}$$

Let us focus on the terms in R.H.S. of above equality separately. Recall that ${\mathcal {F}}_{q,k}= \sigma (t_{q,1},\ldots , t_{q,k})$.

For the first term $ \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] $, it can be expressed as

$$\begin{aligned} \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] =&\textbf{E} \Big [ \Vert \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) + \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Vert ^2 \Big ] \\ =&\textbf{E} \Big [ \Big \Vert \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k})\Big \Vert ^2\Big ] + \textbf{E} \Big [ \Big \Vert \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \Vert ^2 \Big ] \\&+ 2\textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \Big ]. \end{aligned}$$

Due to (7), the first term in $ \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] $ is upper bounded by

$$\begin{aligned}&\textbf{E} \left[ \left\| \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) \right\| ^2\right] \\&\quad = \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla F_{t_{q, k}}(x_q^{k})\right\| ^2 | {\mathcal {F}}_{q, k-1}\right] \right] \\&\quad = \textbf{E}\Big [ \textbf{E}\left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) \right\| ^2 | {\mathcal {F}}_{q, k-1} \right] - \left\| \textbf{E}\left[ \nabla F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k-1}\right] \right\| ^2 \Big ] \\&\quad \le \textbf{E} \left[ \left\| \nabla F_{t_{q, k}}\left( x_q^{k} \right) \right\| ^2 \right] \le L_0^2. \end{aligned}$$

Following Assumption 3.1, the second term in $ \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] $ is upper bounded by

$$\begin{aligned} \textbf{E} \left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \right\| ^2\right] = \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \right\| ^2 | {\mathcal {F}}_{q, k }\right] \right] \le \sigma ^2. \end{aligned}$$

For the third term in $ \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] $, it follows from the definition of $\bar{F}_{q,k-1} $ and Assumption 3.1 that

$$\begin{aligned}&2 \textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \Big ]\\&\quad = 2 \textbf{E}\Big [\textbf{E} \Big [ \Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla F_{t_{q, k}}(x_q^{k}) , \nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) \Big \rangle \big | \mathcal F_{q,k} \Big ] \Big ] \\&\quad = 2 \textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})-\nabla F_{t_{q, k}}(x_q^{k}), \textbf{E}\Big [\nabla F_{t_{q, k}}(x_q^{k}) -{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k}\Big ]\Big \rangle \Big ] =0. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \le L_0^2 + \sigma ^2. \end{aligned}$$

(A1)

Secondly, we consider $ \textbf{E}\left[ \left\| B_q^{k} \right\| ^2 \right] $. Note that

$$\begin{aligned}&\textbf{E} \left[ \left\| B_q^{k} \right\| ^2 \right] \nonumber \\&\quad = \textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})}{K-k+1}-\frac{\sum \limits _{i=k-1}^K \nabla F_{t_{q, i}}(x_q^{k-1})}{K-k+2} \Bigg \Vert ^2 \Bigg ] \nonumber \\&\quad = \textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^K \left( \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1}) \right) }{K-k+2} -\frac{\nabla F_{t_{q, k-1}}(x_q^{k-1})}{K-k+2} \nonumber \\&\qquad +\frac{\sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})}{(K-k+1)(K-k+2)} \Bigg \Vert ^2 \Bigg ] \nonumber \\&\quad \le \textbf{E}\Bigg [ \Bigg ( \frac{ \sum \limits _{i=k}^K\left\| \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1}) \right\| }{K-k+2} + \frac{\left\| \nabla F_{t_{q, k-1}}(x_q^{k-1}) \right\| }{K-k+2} \nonumber \\&\qquad + \frac{\left\| \sum \limits _{i=k}^K \nabla F_{t_{q, i}}(x_q^{k})\right\| }{(K-k+1)(K-k+2)} \Bigg ) ^ 2 \Bigg ] \nonumber \\&\quad \le \textbf{E}\Bigg [ \Bigg ( \frac{(K-k+1)L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K}}{K-k+2} + \frac{L_0}{K-k+2} + \frac{(K-k+1) L_0}{(K-k+1)(K-k+2)} \Bigg )^2 \Bigg ] \nonumber \\&\quad \le \left( \frac{L_1 {\textrm{d}({\mathcal {P}})}}{K-k+2} + \frac{2L_0}{K-k+2} \right) ^2 = \frac{G}{(K-k+2)^2}, \end{aligned}$$

(A2)

where the second inequality is due to the fact that

$$\begin{aligned} \Vert \nabla F_{t_{q, i}}(x_q^{k})-\nabla F_{t_{q, i}}(x_q^{k-1})\Vert&\le L_1 \Vert x_q^{k} - x_q^{k-1} \Vert = L_1 \Big \Vert (v_q^{k-1}-x_q^{k-1})\frac{1}{K}\frac{\sqrt{a^{k-1}}}{a^{k}} \Big \Vert \\&\le L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K} . \end{aligned}$$

Then from Young’s Inequality, it implies

$$\begin{aligned} \textbf{E} \left[ \langle \Delta _q^{k-1} , B_q^{k} \rangle \right] \le \textbf{E} \Big [ \frac{1}{ \rho ^k}\left\| B_q^{k} \right\| ^2+\frac{\rho ^k}{4} \left\| \Delta _q^{k-1} \right\| ^2 \Big ] \le \frac{1}{\rho ^k} \frac{G}{(K-k+2)^2} +\frac{\rho ^k}{4} \textbf{E} \left[ \Vert \Delta _q^{k-1}\Vert ^2 \right] . \end{aligned}$$

(A3)

Thirdly, for $\textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] $, we obtain from (7) and Assumption 3.1 that

$$\begin{aligned}&\textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] \nonumber \\&\quad = \textbf{E}\left[ \left\langle \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}), \nabla \bar{F}_{q, k-2}(x_q^{k-1})-d_q^{k-1}\right\rangle \right] \nonumber \\&\quad = \textbf{E}\Big [\textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}), \nabla \bar{F}_{q, k-2}(x_q^{k-1}) -d_q^{k-1}\Big \rangle | {\mathcal {F}}_{q, k-1} \Big ]\Big ] \nonumber \\&\quad = \textbf{E}\Big [\Big \langle \nabla \bar{F}_{q, k-1}(x_q^{k})- \textbf{E}\left[ {\tilde{\nabla }} F_{t_{q, k}}(x_q^{k}) | {\mathcal {F}}_{q, k-1} \right] , \nabla \bar{F}_{q, k-2}(x_q^{k-1})-d_q^{k-1}\Big \rangle \Big ] = 0. \end{aligned}$$

(A4)

Similarly, we have $ \textbf{E}\left[ \langle A_q^k, B_q^{k} \rangle \right] =0$ which together with (A1)-(A3) yields

$$\begin{aligned} \textbf{E} \left[ \Vert \Delta _q^{k}\Vert ^2 \right] \le&(\rho ^k)^2 \left( L_0^2 + \sigma ^2\right) + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \Vert \Delta _q^{k-1} \Vert ^2 \right] + \left( 1-\rho ^k\right) ^2 \frac{G}{(K-k+2)^2} \\&+ \left( 1-\rho ^k\right) ^2 \Big ( \frac{2}{ \rho ^k }\frac{G}{(K-k+2)^2} + \frac{\rho ^k}{ 2} \textbf{E} \left[ \Vert \Delta _q^{k-1}\Vert ^2 \right] \Big ) . \end{aligned}$$

Hence, we derive (15).

For the case $1 \le k \le \frac{K}{2}+1$, it holds from the setting of $\rho ^k $ in (16) that

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{4 (L_0^2 + \sigma ^2) }{(k+3)^{4 / 3}} + {\frac{2 \left( \frac{(k+3)^{2/3}}{2}-1\right) G }{k^2} } \\ \le&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{4 (L_0^2 + \sigma ^2)+16G }{(k+3)^{4 / 3}}\\ =&\left( 1-\frac{2}{(k+3)^{2/3}} \right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{N_0 }{(k+3)^{4 / 3}} , \end{aligned}$$

where the second inequality is due to

$$\begin{aligned}&\frac{G}{k^2}((k+3)^{2/3}-2 ) = \frac{G}{(k+3)^2} \frac{(k+3)^2}{k^2} ((k+3)^{2/3}-2 ) \\&\quad = \frac{G}{(k+3)^2} (1+\frac{3}{k})^2 ((k+3)^{2/3}-2 ) \le \frac{16 G}{(k+3)^2} ((k+3)^{2/3}-2 ) < \frac{16 G}{(k+3)^{4/3}}. \end{aligned}$$

We shall prove by induction that $\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le \frac{N_1}{(k+4)^{2 / 3}}$ for $1 \le k \le \frac{K}{2}+1$. We start with $k=1$. Note that it follows from Assumption 3.1 and $\rho ^1\le 1 $ that

$$\begin{aligned} \textbf{E} \left[ \Vert \Delta _q^{1}\Vert ^2\right] =&\textbf{E}\left[ \left\| \nabla \bar{F}_{q, 0}(x_q^{1})-d_q^{1}\right\| ^2 \right] = \textbf{E}\left[ \left\| \frac{\sum _{i=1}^K \nabla F_{t_{q, i}}(x_q^{1})}{K} - \rho ^1 {\tilde{\nabla }} F_{t_{q, 1}}(x_q^{1}) \right\| ^2 \right] \\ \le&(L_0+M)^2 \le \frac{N_1}{5^{2 / 3}}. \end{aligned}$$

Suppose that the argument is valid for $k-1$. Then we have

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+3)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{N_0}{(k+3)^{4 / 3}} \\ \le&\left( 1-\frac{2}{(k+3)^{2 / 3}}\right) \frac{N_1}{(k+4-1)^{2 / 3}}+\frac{N_0}{(k+3)^{4 / 3}} \\ =&\frac{N_1 (k+3)^{2 / 3}-2 N_1}{(k+3)^{4 / 3}} +\frac{N_0}{(k+3)^{4 / 3}} \\ \le&\frac{N_1 (- 1+(k+3)^{2 / 3}) }{(k+3)^{4 / 3}} \le \frac{N_1 }{(k+4)^{2 / 3}}, \end{aligned}$$

where the last inequality is due to

$$\begin{aligned} (k+4)^{2 / 3} ((k+3)^{2 / 3}-1) \le ((k+3)^{2 / 3}+1) \left( (k+3)^{2 / 3}-1\right) \le (k+3)^{4 / 3}. \end{aligned}$$

Let us now move to the other case $\frac{K}{2}+2 \le k \le K$. Observe that $\rho ^k=\frac{1.5}{(K-k+2)^{2 / 3}}$, and $ \rho ^k \le \frac{1.5}{2^{2/3}} \le 1$. Then we obtain

$$\begin{aligned}&\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \\&\quad \le \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{2.25 (L_0^2 + \sigma ^2)}{(K-k+2)^{4 / 3}} +\frac{G \left( \frac{2 (K-k+2)^{2 / 3}}{1.5}-2\right) }{(K-k+2)^2} \\&\quad \le \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] + \frac{2.25 (L_0^2 + \sigma ^2) }{(K-k+2)^{4 / 3}} +\frac{4 G}{3(K-k+2)^\frac{4}{3}} \\&\quad = \left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \textbf{E}\left[ \Vert \Delta _q^{k-1}\Vert ^2\right] +\frac{N_2 }{(K-k+2)^{4 / 3}}. \end{aligned}$$

We shall prove by induction that $\textbf{E}\left[ \Vert \Delta _q^{k}\Vert \right] \le \frac{N_3}{(K-k+1)^{2 / 3}}$ for $\frac{K}{2}+1 \le k \le K$. When $k=\frac{K}{2}+1$, based on previous results, it holds that

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{ K / 2+1 }\Vert ^2 \right] \le \frac{N_1}{(K / 2+1+4)^{2 / 3}} \le \frac{N_1}{(K / 2)^{2 / 3}}. \end{aligned}$$

Suppose that the statement is valid for $k-1$, where $\frac{K}{2}+1 \le k-1 \le K$. Then we can derive

$$\begin{aligned} \textbf{E}\left[ \Vert \Delta _q^{k}\Vert \right] \le&\left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \frac{N_3}{(K-k+2)^{2 / 3}} +\frac{N_2 }{(K-k+2)^{4 / 3}} \\ \le&\left( 1-\frac{1.5}{(K-k+2)^{2 / 3}}\right) \frac{N_3}{(K-k+2)^{2 / 3}} + \frac{N_3 }{(K-k+2)^{4 / 3}}\\ =&\frac{ N_3 (-0.5 +(K-k+2)^{2 / 3} )}{(K-k+2)^{4 / 3}} {\le } \frac{ N_3}{(K-k+1)^{2 / 3}}, \end{aligned}$$

where the third inequality holds because

$$\begin{aligned}&(-0.5 +(K-k+2)^{2 / 3} )(K-k+1)^{2/ 3} <(K-k+2)^{4/ 3} . \end{aligned}$$

The proof is completed. $\square $

Appendix B: Proof of Lemma 4.2

Proof

Since ${\tilde{F}}_{q,k-1}$ is nonnegative, $L_1$-smooth and DR-submodular, it follows from (2), (3) and (20) that

$$\begin{aligned}&\langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), x_{\delta }^*-x_q^k\rangle \nonumber \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^k \vee x_{\delta }^*)+{\tilde{F}}_{q,k-1}(x_q^k \wedge x_{\delta }^*)-2 {\tilde{F}}_{q,k-1}(x_q^k) \nonumber \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^k \vee x_{\delta }^*)+0-2 {\tilde{F}}_{q,k-1}(x_q^k) \nonumber \\&\quad \ge \frac{ 1- \Vert z\Vert _{\infty } }{\sqrt{a^{k}}} {\tilde{F}}_{q,k-1}(x_{\delta }^*)-2 {\tilde{F}}_{q,k-1}(x_q^k) \end{aligned}$$

(B5)

for any $q\in [Q]$, $k\in [K+1]$. Then it derives from Step 5 of Algorithm 2, the definition of $ {\textrm{d}({\mathcal {P}})}$, (B5), the setting of $a^{k}$, and Young’s Inequality that

$$\begin{aligned}&a^{k+1} {\tilde{F}}_{q,k-1}(x_q^{k+1})-\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) - a^{k} {\tilde{F}}_{q,k-1}(x_q^{k}) \\&\quad = a^{k+1}\left( {\tilde{F}}_{q,k-1}(x_q^{k+1})- {\tilde{F}}_{q,k-1}(x_q^{k}) \right) +(a^{k+1}-a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k}) \\&\qquad -\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) \\&\quad = a^{k+1} \left( {\tilde{F}}_{q,k-1}\Big (x_q^{k}+(v_q^k-x_q^k) \frac{1}{K}\frac{\sqrt{ a^{k}} }{a^{k+1}}\Big )-{\tilde{F}}_{q,k-1}(x_q^{k})\right) \\&\qquad +(a^{k+1} -a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k} ) -\frac{1}{K} (1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q,k-1}(x_{\delta }^*) \\&\quad \ge a^{k+1} \Bigg ( \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), (v_q^k-x_q^k) \frac{1}{K} \frac{\sqrt{ a^{k}} }{a^{k+1}}\Big \rangle -\frac{L_1}{2}\frac{1}{K^2} \frac{a^{k}}{(a^{k+1})^2} \Vert v_q^k-x_q^k\Vert ^2\Bigg ) \\&\qquad +(a^{k+1}-a^{k}) {\tilde{F}}_{q,k-1}(x_q^{k}) -\frac{\sqrt{a^{k}}}{K} \left( \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), x_{\delta }^*-x_q^k\Big \rangle + 2{\tilde{F}}_{q,k-1}(x_q^k) \right) \\&\quad \ge {\tilde{F}}_{q,k-1}(x_q^{k}) \Big ((a^{k+1}-a^{k})-\frac{2\sqrt{a^{k}}}{K}\Big ) + \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k), (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K} \Big \rangle -\frac{L_1}{2}\frac{1}{K^2} {\textrm{d}}^2({\mathcal {P}}) \\&\quad = \frac{1}{K^2} {\tilde{F}}_{q,k-1}(x_q^{k}) + \Big \langle \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k, (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K}\Big \rangle + \Big \langle d_q^k, (v_q^k- x_{\delta }^*)\frac{\sqrt{a^{k}}}{K} \Big \rangle \\&\qquad -\frac{L_1 {\textrm{d}}^2({\mathcal {P}})}{2 K^2} \\&\quad \ge -\frac{\sqrt{a^{k}}}{K} \frac{1}{2 \beta ^k} || \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k ||^2 -\frac{\sqrt{a^{k}}}{K} \frac{ \beta ^k}{2 } {\textrm{d}}^2({\mathcal {P}}) - \frac{\sqrt{a^{k}}}{K} \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle -\frac{L_1 {\textrm{d}}^2({\mathcal {P}})}{2 K^2} \end{aligned}$$

for any $q\in [Q]$, $k\in [K]$. $\square $

Appendix C: Proof of Lemma 4.4

Proof

Following from $ x_q^1:=z$, $x_q^{K+1}:=x_q$, $a^{K+1}=4$ and $a^1=1$, taking total expectation on both sides of (23), applying Lemma 4.1 (i), and summing up this inequality over all $k\in [K]$ yields

$$\begin{aligned} \begin{aligned} \textbf{E} \left[ -{\tilde{F}}_{q} (x_q) + \frac{1}{4}(1- \Vert z\Vert _{\infty } ) {\tilde{F}}_{q}(x_{\delta }^*) + \frac{1}{4} {\tilde{F}}_{q }(z) \right] \le \frac{L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } + \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{K} \frac{ \beta ^k}{8 } {\textrm{d}}^2({\mathcal {P}}) \\ + \sum \limits _{k=1}^{K} \frac{\sqrt{a^{k}}}{K} \frac{1}{8 \beta ^k} \textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k \Vert ^2 \right] + \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{4 K} \textbf{E} \left[ \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \right] . \end{aligned} \end{aligned}$$

In consideration of ${\tilde{F}}_{q }(z)\ge 0,$ we sum the inequality over q from 1 to Q, then obtain

$$\begin{aligned} \begin{aligned}&\textbf{E} \left[ \sum \limits _{q=1}^{Q} \frac{ 1-\Vert z\Vert _{\infty } }{4} {\tilde{F}}_{q}(x_{\delta }^*) - \sum \limits _{q=1}^{Q} {\tilde{F}}_{q} (x_q) \right] \le \frac{Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } +\sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}} \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 8 K} \\&\quad + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K} \frac{\sqrt{a^{k}}}{K} \frac{1}{8 \beta ^k} \textbf{E} \left[ || \nabla {\tilde{F}}_{q,k-1} (x_q^k)-d_q^k ||^2 \right] + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K}\frac{\sqrt{a^{k}}}{4 K} \textbf{E} \left[ \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \right] . \end{aligned} \end{aligned}$$

Since $x_{\delta }^*\in {\mathcal {P}}' $, we can apply Lemma 4.3 to yield

$$\begin{aligned} \sum \limits _{q=1}^{Q} \langle d_q^k, (x_{\delta }^* - v_q^k ) \rangle \le \frac{ 2 n\bar{M}}{\delta }\sqrt{\frac{2 Q D_R}{\mu }}, ~~k \in [K]. \end{aligned}$$

Therefore, by ${a^k}\le 4$, we deduce

$$\begin{aligned} \begin{aligned} \textbf{E} \left[ \sum \limits _{q=1}^{Q} \frac{ 1-\Vert z\Vert _{\infty } }{4} {\tilde{F}}_{q}(x_{\delta }^*) - \sum \limits _{q=1}^{Q} {\tilde{F}}_{q} (x_q) \right] \le \frac{Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K } + \sum \limits _{k=1}^{K}\frac{ \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K}\\ + \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K} \frac{\textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1} (x_q^k)-d_q^k \Vert ^2 \right] }{4 \beta ^k K} + \frac{ n\bar{M}}{\delta }\sqrt{\frac{2 Q D_R}{\mu }}. \end{aligned} \end{aligned}$$

It indicates (24) from (22), the definition of $\tilde{F}_{q }$ in (19), and $T=QL$. $\square $

Appendix D: Proof of Lemma 4.5

Proof

Note that from the setting of $d_q^k$ in Step 18 of Algorithm 2,

$$\begin{aligned} \Vert \Lambda _q^{k}\Vert ^2 =&\left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-\left( 1-\rho ^k\right) d_q^{k-1}-\rho ^k g_q^{k}\right\| ^2 \\ =&\Big \Vert \rho ^k\left( \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-g_q^{k}\right) +\left( 1-\rho ^k\right) \Big (\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1}) -d_q^{k-1}\Big ) \\&+\left( 1-\rho ^k\right) \left( \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1})\right) \Big \Vert ^2 ~~ \end{aligned}$$

for any $q\in [Q]$ and $k=2,\ldots ,K.$ For convenience, we denote $C_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-g_q^{k}$, $D_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1}) $. Then we express $[ \Vert \Lambda _q^{k}\Vert ^2 ]$ as

$$\begin{aligned}&\textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2 \right] = (\rho ^k)^2 \textbf{E} \left[ \left\| C_q^k \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Lambda _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| D_q^k \right\| ^2 \right] + 2 \rho ^k \nonumber \\&\quad \cdot \left( 1-\rho ^k\right) \textbf{E}\left[ \langle C_q^k, \Lambda _q^{k-1} \rangle \right] + 2 \rho ^k \left( 1-\rho ^k\right) \textbf{E}\left[ \langle C_q^k, D_q^{k} \rangle \right] + 2 \left( 1-\rho ^k\right) ^2 \textbf{E}\left[ \langle \Lambda _q^{k-1} , D_q^{k} \rangle \right] . \end{aligned}$$

(D6)

Next we handle these terms respectively. For the first term of (D6), it holds from Lemma 4.1 (ii) and $L_0$-Lipschitz continuity of $ \hat{F_t}$ that

$$\begin{aligned} \textbf{E} \left[ \left\| C_q^k \right\| ^2\right] =&\textbf{E} \left[ \left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k})+ \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2\right] \nonumber \\ =&\textbf{E} \left[ \left\| \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}) \right\| ^2\right] +\textbf{E} \Big [ \Big \Vert \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \Big \Vert ^2\Big ] \nonumber \\&+ 2\textbf{E} \left[ \left\langle \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}) , \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\rangle \right] \nonumber \\ \le&L_0^2 + \textbf{E} \left[ \left\| \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2\right] \nonumber \\ \le&L_0^2 + \textbf{E}\left[ \textbf{E}\left[ \left\| \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \right\| ^2 | {\mathcal {F}}_{q, k }\right] \right] \nonumber \\ =&L_0^2 + \textbf{E}\left[ \textbf{E}\left[ \left\| g_q^{k} \right\| ^ 2 | {\mathcal {F}}_{q, k } \right] - \left\| \textbf{E}\left[ g_q^{k} | {\mathcal {F}}_{q, k }\right] \right\| ^ 2 \right] \nonumber \\ \le&L_0^2+ \textbf{E}\left[ \left\| \frac{n}{\delta } F_{t_{q,k}}(x_{t_{q,k}}+\delta u_q^k ) u_q^k \right\| ^2 \right] \le L_0^2 + \left( \frac{n \bar{M}}{\delta }\right) ^2 , \end{aligned}$$

(D7)

where we use the fact that $ 2 \textbf{E} [ \langle \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}), \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \rangle ]=0 $ by Remark 4.2. For the third term of (D6), it follows from $L_0$-Lipschitz continuity and $L_1$-smoothness of $\hat{F}_t$ that

$$\begin{aligned} \textbf{E} \left[ \left\| D_q^{k} \right\| ^2 \right] = \,&\textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})}{L-k+1}-\frac{\sum \limits _{i=k-1}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k-1})}{L-k+2} \Bigg \Vert ^2 \Bigg ] \nonumber \\ = \,&\textbf{E}\Bigg [ \Bigg \Vert \frac{\sum \limits _{i=k}^L \left( \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1}) \right) -\nabla \hat{F}_{t_{q, k-1}}(x_q^{k-1})}{L-k+2} \nonumber \\&+\frac{\sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})}{(L-k+1)(L-k+2)} \Bigg \Vert ^2 \Bigg ] \nonumber \\ \le&\textbf{E}\Bigg [ \Bigg ( \frac{ \sum \limits _{i=k}^L\left\| \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1}) \right\| + \left\| \nabla \hat{F}_{t_{q, k-1}}(x_q^{k-1}) \right\| }{L-k+2} \nonumber \\&+ \frac{\left\| \sum \limits _{i=k}^L \nabla \hat{F}_{t_{q, i}}(x_q^{k})\right\| }{(L-k+1)(L-k+2)} \Bigg ) ^ 2 \Bigg ] \nonumber \\ \le&\textbf{E}\Bigg [ \Bigg ( \frac{(L-k+1)L_1 {\textrm{d}({\mathcal {P}})} \frac{1}{K}}{L-k+2} + \frac{L_0}{L-k+2} + \frac{(L-k+1) L_0}{(L-k+1)(L-k+2)} \Bigg )^2 \Bigg ] \nonumber \\ = \,&\left( \frac{L-k+1}{L-k+2} \frac{L_1 {\textrm{d}({\mathcal {P}})}}{K} + \frac{2L_0}{L-k+2} \right) ^2 \nonumber \\ \le&\left( \frac{3L_1 {\textrm{d}({\mathcal {P}})}}{ k+2 } + \frac{2L_0}{ k+2 } \right) ^2 =\, \frac{H}{(k+2)^2} , \end{aligned}$$

(D8)

where the second inequality is due to $L\ge 2K$ and

$$\begin{aligned} \Vert \nabla \hat{F}_{t_{q, i}}(x_q^{k})-\nabla \hat{F}_{t_{q, i}}(x_q^{k-1})\Vert \le \frac{L_1}{K} \Big \Vert (v_q^{k-1}-x_q^{k-1})\frac{\sqrt{a^{k-1}}}{a^{k}} \Big \Vert \le \frac{{L_1\textrm{d}({\mathcal {P}})} }{K} . \end{aligned}$$

For the fourth term and the fifth term of (D6), in the light of the definition of $\tilde{F}_{q,k-1} $ and Lemma 4.1 (iii), we can mimic the proof of (A4) and derive

$$\begin{aligned} \textbf{E}\left[ \langle C_q^k, \Lambda _q^{k-1} \rangle \right] =\,0 \quad \text{ and }\quad \textbf{E}\left[ \langle C_q^k, D_q^{k} \rangle \right] =\,0. \end{aligned}$$

(D9)

For the sixth term of (D6), it yields from Young’s Inequality that

$$\begin{aligned} \textbf{E} \left[ \langle \Lambda _q^{k-1} , D_q^{k} \rangle \right] \le \textbf{E} \left[ \frac{1}{ \rho ^k}\left\| D_q^{k} \right\| ^2+\frac{\rho ^k}{4} \left\| \Lambda _q^{k-1} \right\| ^2 \right] \le \frac{1}{ \rho ^k} \frac{H}{( k+2)^2} +\frac{\rho ^k}{4} \textbf{E} \left[ \Vert \Lambda _q^{k-1}\Vert ^2 \right] . \end{aligned}$$

(D10)

Above all, (D7)-(D10) jointly guarantee

$$\begin{aligned} \textbf{E} \left[ \Vert \Lambda _q^{k}\Vert ^2 \right] \le&(\rho ^k)^2 \left( L_0^2 + \left( \frac{n \bar{M}}{\delta } \right) ^2\right) + \left( 1-\rho ^k\right) ^2 \textbf{E} \left[ \left\| \Lambda _q^{k-1} \right\| ^2 \right] + \left( 1-\rho ^k\right) ^2 \frac{H}{( k+2)^2} \\&+ \left( 1-\rho ^k\right) ^2\left( \frac{2}{ \rho ^k }\frac{H}{( k+2)^2} + \frac{\rho ^k}{ 2} \textbf{E} \left[ \Vert \Lambda _q^{k-1}\Vert ^2 \right] \right) \\ \le&(\rho ^k)^2 \overline{\sigma } +\frac{2 \left( \frac{1}{\rho ^k}-1\right) H }{( k+2)^2} + \left( 1-\rho ^k\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] , \end{aligned}$$

where we use the fact that $\left( 1-\rho ^k\right) ^2 (1+\frac{2}{ \rho ^k })\le \frac{2}{\rho ^k}-2 $ and $\left( 1-\rho ^k\right) ^2 (1+\frac{\rho ^k}{ 2 })\le 1-\rho ^k $.

If $ \rho ^k=\,\frac{2}{(k+2)^{2/3}}$, then $ \rho ^k \le \frac{2}{ 2.08 } \le 1 $. When $k\ge 2$, it follows from (25) that

$$\begin{aligned} \textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] +\frac{4 \overline{\sigma }}{(k+2)^{4/3}} +\frac{\left( (k+2)^{2/3}-2\right) H }{( k+2)^2} \nonumber \\ \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] + \frac{4\overline{\sigma }+ H }{(k+2)^{4/3}} -0 \nonumber \\ \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] + \frac{N_4}{(k+2)^{4/3}}. \end{aligned}$$

(D11)

Next we prove (26) by induction on k. When $k=1$, it follows from $L_0$-Lipschitz continuity of $F_t $ and the definition of $\bar{\sigma }$ in Lemma 4.5 that

$$\begin{aligned} \textbf{E} \left[ \Vert \Lambda _q^{1}\Vert ^2\right] =\,&\textbf{E}\left[ \left\| \nabla {\tilde{F}}_{q, 0}(z)-d_q^{1}\right\| ^2 \right] \\ =\,&\textbf{E}\left[ \left\| \frac{\sum _{i=1}^L \nabla \hat{F}_{t_{q, i}}(z)}{L} - (1-\rho ^1)d_q^0- \rho ^1 g_q^1 \right\| ^2 \right] \\ \le&\textbf{E}\Big [ \Big ( \frac{1}{L} \Big \Vert \sum _{i=1}^L \nabla \hat{F}_{t_{q, i}}(z) \Big \Vert + \frac{2}{3^{2/3} } \Big \Vert \frac{n}{\delta } F_{t_{q,1}}(x_{t_{q,1}}+\delta u_q^1 ) u_q^1 \Big \Vert \Big )^2 \Big ] \\ \le&\left( L_0+\frac{n \bar{M}}{\delta }\right) ^2 \le 2 L_0^2 +2 \left( \frac{n \bar{M}}{\delta } \right) ^2 = 2 \overline{\sigma } \le \frac{N_4}{(1+3)^{2 / 3}}. \end{aligned}$$

We now assume $ \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] \le \frac{N_4}{(k+2)^{2 / 3}}. $ Then we can obtain from (D11) and

$$\begin{aligned} (k+3)^{2 / 3}\left( (k+2)^{2 / 3}-1\right) \le \left( (k+2)^{2 / 3}+1\right) \left( (k+2)^{2 / 3}-1\right) <(k+2)^{4 / 3} \end{aligned}$$

that

$$\begin{aligned} \textbf{E}\left[ \Vert \Lambda _q^{k}\Vert ^2\right] \le&\left( 1-\frac{2}{(k+2)^{2/3}}\right) \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] +\frac{N_4}{(k+2)^{4/3}} \\ \le&\left( 1-\frac{2}{(k+2)^{2 / 3}}\right) \frac{N_4}{(k-1+3)^{2 / 3}} + \frac{N_4}{(k+2)^{4 / 3}} \\ \le&\frac{N_4 (- 1+(k+2)^{2 / 3}) }{(k+2)^{4 / 3}} \le \frac{N_4 }{(k+3)^{2 / 3}}. \end{aligned}$$

$\square $

Appendix E: Proof of Theorem 4.1

Proof

By Lemma 4.2 with $ \beta ^{k} = \frac{1}{\delta (k+3)^{1 / 3} }$, for the fourth term of the R.H.S. of (24), it yields from Lemma 4.5 that

$$\begin{aligned}&\sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K } \frac{L \textbf{E} \left[ \Vert \nabla {\tilde{F}}_{q,k-1}(x_q^k)-d_q^k \Vert ^2 \right] }{4 \beta ^k K} \\&\quad \le \sum \limits _{q=1}^{Q} \sum \limits _{k=1}^{K } \frac{ L\delta (k+3)^{1/3}}{4 K} \frac{N_4 }{(k+3)^{2 / 3}} { \le \frac{ Q L \delta N_4 }{4 K} \int _0^{K } \frac{1 }{ (x+3) ^{1 / 3}} d x}\\&\quad \le \frac{ 3 T \delta N_4 }{8 K^{1/3}} =\, \frac{ 3 \big (L_0^2 + \left( 3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0\right) ^2\big ) }{ 2^{2/3} } \frac{T \delta }{ K^{1/3} } + \frac{ 3 n^2 \bar{M}^2}{ 2^{2/3}} \frac{T }{ K^{1/3} \delta } \\&\quad \le 2 \big (L_0^2 + \left( 3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0\right) ^2\big ) \frac{T \delta }{ K^{1/3} } + 2 n^2 \bar{M}^2 \frac{T }{ K^{1/3} \delta }. \end{aligned}$$

For the third term of the R.H.S. of (24), we have

$$\begin{aligned}&\sum \limits _{k=1}^{K}\frac{L \beta ^k Q {\textrm{d}^2({\mathcal {P}}}) }{ 4K } =\, \sum \limits _{k=1}^{K }\frac{ L Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K \delta (k+3)^{1/3}} \\&\quad \le \frac{ L Q {\textrm{d}^2({\mathcal {P}}}) }{ 4 K \delta } \int _0^{K } \frac{1 }{(x+3)^{1 / 3}} d x \le \frac{ 3 T {\textrm{d}^2({\mathcal {P}}}) }{ 8 \delta K^{1/3} } . \end{aligned}$$

Hence, it implies

$$\begin{aligned} \begin{aligned}&\textbf{E}\left[ \textrm{Regret}\left( \frac{1- \Vert z\Vert _{\infty } }{4}, T\right) \right] \le \frac{1- \Vert z\Vert _{\infty } }{4} T L_0 c_1 \delta ^\gamma + \frac{L Q L_1 {\textrm{d}^2({\mathcal {P}}})}{8 K }\\&\quad + \frac{ 2 n^2 (\bar{M})^2 T }{ K^{1/3} \delta } + \frac{n\bar{M} L}{\delta } \\&\cdot \sqrt{\frac{2 D_R Q}{\mu }}+ QK \bar{M} + {\frac{ \big (2L_0^2 + 2(3 L_1 {\textrm{d}({\mathcal {P}})} +2 L_0)^2\big )T \delta }{ K^{1/3}}}\\&\quad + \frac{ 3 T {\textrm{d}^2({\mathcal {P}}}) }{ 8 \delta K^{1/3}} + \frac{ 5- \Vert z\Vert _{\infty } }{4} T L_0 \delta , \end{aligned} \end{aligned}$$

which by the parameter settings derives the conclusion. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ju, J., Wang, X. & Xu, D. Online non-monotone diminishing return submodular maximization in the bandit setting. J Glob Optim 90, 619–649 (2024). https://doi.org/10.1007/s10898-024-01413-0

Download citation

Received: 31 July 2023
Accepted: 29 May 2024
Published: 13 June 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s10898-024-01413-0

Online non-monotone diminishing return submodular maximization in the bandit setting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online Non-monotone DR-Submodular Maximization: 1/4 Approximation Ratio and Sublinear Regret

Data Availibility

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Statements & Declarations

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 3.5

Proof

Appendix B: Proof of Lemma 4.2

Proof

Appendix C: Proof of Lemma 4.4

Proof

Appendix D: Proof of Lemma 4.5

Proof

Appendix E: Proof of Theorem 4.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Online non-monotone diminishing return submodular maximization in the bandit setting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online $$\textrm{L}^{\natural }$$ -Convex Minimization

Online Weakly DR-Submodular Optimization with Stochastic Long-Term Constraints

Online Non-monotone DR-Submodular Maximization: 1/4 Approximation Ratio and Sublinear Regret

Data Availibility

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Statements & Declarations

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 3.5

Proof

Appendix B: Proof of Lemma 4.2

Proof

Appendix C: Proof of Lemma 4.4

Proof

Appendix D: Proof of Lemma 4.5

Proof

Appendix E: Proof of Theorem 4.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation