Abstract
In this paper, we study online diminishing return submodular (DR-submodular for short) maximization in the bandit setting. Our focus is on problems where the reward functions can be non-monotone, and the constraint set is a general convex set. We first present the Single-sampling Non-monotone Frank-Wolfe algorithm. This algorithm only requires a single call to each reward function, and it computes the stochastic gradient to make it suitable for large-scale settings where full gradient information might not be available. We provide an analysis of the approximation ratio and regret bound of the proposed algorithm. We then propose the Bandit Online Non-monotone Frank-Wolfe algorithm to adjust for problems in the bandit setting, where each reward function returns the function value at a single point. We take advantage of smoothing approximations to reward functions to tackle the challenges posed by the bandit setting. Under mild assumptions, our proposed algorithm can reach \(\frac{1}{4} (1- \min _{x\in {\mathcal {P}}'}\Vert x\Vert _\infty )\)-approximation with regret bounded by \(O (T^{\frac{5 \min \{1, \gamma \}+5 }{6 \min \{1, \gamma \}+5}})\), where the positive parameter \(\gamma \) is related to the “safety domain” \({\mathcal {P}}'\). To the best of our knowledge, this is the first work to address online non-monotone DR-submodular maximization over a general convex set in the bandit setting.
Similar content being viewed by others
Data Availibility
This research has no associated data.
Notes
The set \({\mathcal {P}}'\) is a \(\delta \)-interior of \({\mathcal {P}}\) if it holds for every \(x\in {\mathcal {P}}'\) that \(B(x,\delta )\subseteq {\mathcal {P}}\).
References
Cover, T.M., Ordentlich, E.: Universal portfolios with side information. IEEE Trans. Inf. Theory 42(2), 348–363 (2006). https://doi.org/10.1109/18.485708
Foster, D.P., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav. 29(1–2), 7–35 (1999). https://doi.org/10.1006/game.1999.0740
Stoltz, G., Lugosi, G.: Internal regret in on-line portfolio selection. Mach. Learn. 59(1), 125–159 (2005). https://doi.org/10.1007/s10994-005-0465-4
Chakrabarty, D., Ene, A., Krishnaswamy, R., Panigrahi, D.: Online buy-at-bulk network design. SIAM J. Comput. 47(4), 1505–1528 (2018). https://doi.org/10.1137/16M1117317
Nouruzi, A., Zakeri, A., Javan, M.R., Mokari, N., Hussain, R., Kazmi, S.M.A.: Online service provisioning in NFV-enabled networks using deep reinforcement learning. IEEE Trans. Netw. Service Manag. 19(3), 3276–3289 (2022). https://doi.org/10.1109/TNSM.2022.3159670
Edmonds, J.: Submodular Functions, Matroids, and Certain Polyhedra, R. Guy, H. Hanani, N. Sauer, and J. Schönheim edn., pp. 69–87. Gordon and Breach, New York (1970)
Chakrabarty, D., Goel, G., Vazirani, V.V., Wang, L., Yu, C.: Submodularity helps in Nash and nonsymmetric bargaining games. SIAM J. Discret Math. 28(1), 99–115 (2014). https://doi.org/10.1137/110821433
Malings, C., Pozzi, M.: Submodularity issues in value-of-information-based sensor placement. Reliab. Eng. Syst. Saf. 183, 93–103 (2019). https://doi.org/10.1016/j.ress.2018.11.010
Li, J., Li, L., Li, T.: Multi-document summarization via submodularity. Appl. Intell. 37(3), 420–430 (2019). https://doi.org/10.1007/s10489-012-0336-1
Gao, C., Gu, S., Yu, J., Du, H., Wu, W.: Adaptive seeding for profit maximization in social networks. J. Glob. Optim. 82(2), 413–432 (2022). https://doi.org/10.1007/s10898-021-01076-1
Dam, T.T., Ta, T.A., Mai, T.: Robust maximum capture facility location under random utility maximization models. Eur. J. Oper. Res. 310(3), 1128–1150 (2023). https://doi.org/10.1016/j.ejor.2023.04.024
Bian, A.A., Mirzasoleiman, B., Buhmann, J., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 111–120 (2017). https://proceedings.mlr.press/v54/bian17a.html
Bian, S., Guo, Q., Wang, S., Yu, J.X.: Efficient algorithms for budgeted influence maximization on massive social networks. In: Proceedings of the 46th International Conference on Very Large Data Bases, pp. 1498–1510 (2020). https://doi.org/10.14778/3397230.3397244
Bian, Y., Buhmann, J., Krause, A.: Optimal continuous DR-submodular maximization and applications to provable mean field inference. In: Proceedings of the 36th International Conference on Machine Learning, pp. 644–653 (2019). https://proceedings.mlr.press/v97/bian19a.html
Wu, H.-H., Küçükyavuz, S.: Probabilistic partial set covering with an oracle for chance constraints. SIAM J. Optim. 29(1), 690–718 (2019). https://doi.org/10.1137/17M1141576
Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization. In: Proceedings of the 31st International Conference on Artificial Intelligence and Statistics, pp. 1896–1905 (2018). http://proceedings.mlr.press/v84/chen18f.html
Th\({\acute{\breve{{\rm a}}}}\)ng, N.K., Srivastav, A.: Online non-monotone DR-submodular maximization. arXiv:1909.11426 (2019) https://arxiv.org/abs/1909.11426
Hazan, E.: Introduction to online convex optimization. Foundations and Trends® in Optimization 2(3-4), 157–325 (2016). https://doi.org/10.1561/2400000013
Bian, A., Levy, K.Y., Krause, A., Buhmann, J.M.: Continuous DR-submodular maximization: structure and algorithms. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 486–496 (2017). https://doi.org/10.5555/3294771.3294818
Du, D., Liu, Z., Wu, C., Xu, D., Zhou, Y.: An improved approximation algorithm for maximizing a DR-submodular function over a convex set. arXiv:2203.14740 (2022)
Streeter, M., Golovin, D.: An online algorithm for maximizing submodular functions. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1577–1584 (2008). https://doi.org/10.5555/2981780.2981977
Zhang, M., Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization: from full-information to bandit feedback. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 9210–9221 (2019). https://doi.org/10.5555/3454287.3455113
Mualem, L., Feldman, M.: Resolving the approximability of offline and online non-monotone DR-submodular maximization over general convex sets. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 2542–2564 (2023). https://proceedings.mlr.press/v206/mualem23a.html
Zhu, J., Wu, Q., Zhang, M., Zheng, R., Li, K.: Projection-free decentralized online learning for submodular maximization over time-varying networks. J. Mach. Learn. Res. 22(1), 2328–2369 (2021). https://doi.org/10.5555/3546258.3546309
Mokhtari, A., Hassani, H., Karbasi, A.: Decentralized submodular maximization: bridging discrete and continuous settings. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3616–3625 (2018). http://proceedings.mlr.press/v80/mokhtari18a.html
Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005). https://doi.org/10.5555/1070432.1070486
Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: closing the gap. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pp. 1886–1895 (2018). http://proceedings.mlr.press/v84/mokhtari18a.html
Mokhari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: from convex minimization to submodular maximization. J. Mach. Learn. Res. 21(1), 4232–4280 (2020). https://doi.org/10.5555/3455716.3455821
Zhang, Q., Deng, Z., Chen, Z., Zhou, K., Hu, H., Yang, Y.: Online learning for non-monotone submodular maximization: from full information to bandit feedback. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pp. 3515–3537 (2023). https://proceedings.mlr.press/v206/zhang23f.html
Nguyen, T.-A., Th\(\acute{\breve{{\rm a}}}\)ng, N.K., Trystram, D.: One gradient Frank-Wolfe for decentralized online convex and submodular optimization. In: Proceedings of the 14th Asian Conference on Machine Learning, pp. 802–815 (2023). https://proceedings.mlr.press/v189/nguyen23a.html
Hassani, H., Soltanolkotabi, M., Karbasi, A.: Gradient methods for submodular maximization. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5843–5853 (2017). https://doi.org/10.5555/3295222.3295334
Chekuri, C., Jayram, T.S., Vondrák, J.: On multiplicative weight updates for concave and submodular function maximization. In: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 201–210 (2015). https://doi.org/10.1145/2688073.2688086
Shalev-Shwartz, S.: Online learning and online convex optimization. Foundations and Trends® in Machine Learning 4(2), 107–194 (2012). https://doi.org/10.1561/2200000018
Chen, L., Zhang, M., Hassani, H., Karbasi, A.: Black box submodular maximization: discrete and continuous settings. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pp. 1058–1070 (2020). https://proceedings.mlr.press/v108/chen20c.html
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Statements & Declarations
This research was partially supported by the National Natural Science Foundation of China (Nos. 12131003, 12271278), the Major Key Project of PCL (No. PCL2022A05) and the Talent Program of Guangdong Province (No. 2021QN02X160). The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Lemma 3.5
Proof
It follows from the setting of \(d_q^k\) in Step 14 of Algorithm 1 that
for any \(q\in [Q]\) and \(k =2,~3,~\ldots , K.\) For brevity, denote \(A_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k})-{\tilde{\nabla }} F_{t_{q, k}}(x_q^{k})\), \(B_q^k:= \nabla \bar{F}_{q, k-1}(x_q^{k}) -\nabla \bar{F}_{q, k-2}(x_q^{k-1}) \), then
Let us focus on the terms in R.H.S. of above equality separately. Recall that \({\mathcal {F}}_{q,k}= \sigma (t_{q,1},\ldots , t_{q,k})\).
For the first term \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \), it can be expressed as
Due to (7), the first term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \) is upper bounded by
Following Assumption 3.1, the second term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \) is upper bounded by
For the third term in \( \textbf{E} \left[ \left\| A_q^k \right\| ^2\right] \), it follows from the definition of \(\bar{F}_{q,k-1} \) and Assumption 3.1 that
Therefore, we obtain
Secondly, we consider \( \textbf{E}\left[ \left\| B_q^{k} \right\| ^2 \right] \). Note that
where the second inequality is due to the fact that
Then from Young’s Inequality, it implies
Thirdly, for \(\textbf{E}\left[ \langle A_q^k, \Delta _q^{k-1} \rangle \right] \), we obtain from (7) and Assumption 3.1 that
Similarly, we have \( \textbf{E}\left[ \langle A_q^k, B_q^{k} \rangle \right] =0\) which together with (A1)-(A3) yields
Hence, we derive (15).
For the case \(1 \le k \le \frac{K}{2}+1\), it holds from the setting of \(\rho ^k \) in (16) that
where the second inequality is due to
We shall prove by induction that \(\textbf{E}\left[ \Vert \Delta _q^{k}\Vert ^2\right] \le \frac{N_1}{(k+4)^{2 / 3}}\) for \(1 \le k \le \frac{K}{2}+1\). We start with \(k=1\). Note that it follows from Assumption 3.1 and \(\rho ^1\le 1 \) that
Suppose that the argument is valid for \(k-1\). Then we have
where the last inequality is due to
Let us now move to the other case \(\frac{K}{2}+2 \le k \le K\). Observe that \(\rho ^k=\frac{1.5}{(K-k+2)^{2 / 3}}\), and \( \rho ^k \le \frac{1.5}{2^{2/3}} \le 1\). Then we obtain
We shall prove by induction that \(\textbf{E}\left[ \Vert \Delta _q^{k}\Vert \right] \le \frac{N_3}{(K-k+1)^{2 / 3}}\) for \(\frac{K}{2}+1 \le k \le K\). When \(k=\frac{K}{2}+1\), based on previous results, it holds that
Suppose that the statement is valid for \(k-1\), where \(\frac{K}{2}+1 \le k-1 \le K\). Then we can derive
where the third inequality holds because
The proof is completed. \(\square \)
Appendix B: Proof of Lemma 4.2
Proof
Since \({\tilde{F}}_{q,k-1}\) is nonnegative, \(L_1\)-smooth and DR-submodular, it follows from (2), (3) and (20) that
for any \(q\in [Q]\), \(k\in [K+1]\). Then it derives from Step 5 of Algorithm 2, the definition of \( {\textrm{d}({\mathcal {P}})}\), (B5), the setting of \(a^{k}\), and Young’s Inequality that
for any \(q\in [Q]\), \(k\in [K]\). \(\square \)
Appendix C: Proof of Lemma 4.4
Proof
Following from \( x_q^1:=z\), \(x_q^{K+1}:=x_q\), \(a^{K+1}=4\) and \(a^1=1\), taking total expectation on both sides of (23), applying Lemma 4.1 (i), and summing up this inequality over all \(k\in [K]\) yields
In consideration of \({\tilde{F}}_{q }(z)\ge 0,\) we sum the inequality over q from 1 to Q, then obtain
Since \(x_{\delta }^*\in {\mathcal {P}}' \), we can apply Lemma 4.3 to yield
Therefore, by \({a^k}\le 4\), we deduce
It indicates (24) from (22), the definition of \(\tilde{F}_{q }\) in (19), and \(T=QL\). \(\square \)
Appendix D: Proof of Lemma 4.5
Proof
Note that from the setting of \(d_q^k\) in Step 18 of Algorithm 2,
for any \(q\in [Q]\) and \(k=2,\ldots ,K.\) For convenience, we denote \(C_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k})-g_q^{k}\), \(D_q^k:= \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla {\tilde{F}}_{q, k-2}(x_q^{k-1}) \). Then we express \([ \Vert \Lambda _q^{k}\Vert ^2 ]\) as
Next we handle these terms respectively. For the first term of (D6), it holds from Lemma 4.1 (ii) and \(L_0\)-Lipschitz continuity of \( \hat{F_t}\) that
where we use the fact that \( 2 \textbf{E} [ \langle \nabla {\tilde{F}}_{q, k-1}(x_q^{k}) -\nabla \hat{F}_{t_{q, k}}(x_q^{k}), \nabla \hat{F}_{t_{q, k}}(x_q^{k}) -g_q^{k} \rangle ]=0 \) by Remark 4.2. For the third term of (D6), it follows from \(L_0\)-Lipschitz continuity and \(L_1\)-smoothness of \(\hat{F}_t\) that
where the second inequality is due to \(L\ge 2K\) and
For the fourth term and the fifth term of (D6), in the light of the definition of \(\tilde{F}_{q,k-1} \) and Lemma 4.1 (iii), we can mimic the proof of (A4) and derive
For the sixth term of (D6), it yields from Young’s Inequality that
Above all, (D7)-(D10) jointly guarantee
where we use the fact that \(\left( 1-\rho ^k\right) ^2 (1+\frac{2}{ \rho ^k })\le \frac{2}{\rho ^k}-2 \) and \(\left( 1-\rho ^k\right) ^2 (1+\frac{\rho ^k}{ 2 })\le 1-\rho ^k \).
If \( \rho ^k=\,\frac{2}{(k+2)^{2/3}}\), then \( \rho ^k \le \frac{2}{ 2.08 } \le 1 \). When \(k\ge 2\), it follows from (25) that
Next we prove (26) by induction on k. When \(k=1\), it follows from \(L_0\)-Lipschitz continuity of \(F_t \) and the definition of \(\bar{\sigma }\) in Lemma 4.5 that
We now assume \( \textbf{E}\left[ \Vert \Lambda _q^{k-1}\Vert ^2\right] \le \frac{N_4}{(k+2)^{2 / 3}}. \) Then we can obtain from (D11) and
that
\(\square \)
Appendix E: Proof of Theorem 4.1
Proof
By Lemma 4.2 with \( \beta ^{k} = \frac{1}{\delta (k+3)^{1 / 3} }\), for the fourth term of the R.H.S. of (24), it yields from Lemma 4.5 that
For the third term of the R.H.S. of (24), we have
Hence, it implies
which by the parameter settings derives the conclusion. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ju, J., Wang, X. & Xu, D. Online non-monotone diminishing return submodular maximization in the bandit setting. J Glob Optim 90, 619–649 (2024). https://doi.org/10.1007/s10898-024-01413-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-024-01413-0