Abstract
We consider maximizing a monotone submodular function under a cardinality constraint or a knapsack constraint in the streaming setting. In particular, the elements arrive sequentially and at any point of time, the algorithm has access to only a small fraction of the data stored in primary memory. We propose the following streaming algorithms taking O(ε− 1) passes: (1) a (1 − e− 1 − ε)-approximation algorithm for the cardinality-constrained problem, (2) a (0.5 − ε)-approximation algorithm for the knapsack-constrained problem. Both of our algorithms run deterministically in O∗(n) time, using O∗(K) space, where n is the size of the ground set and K is the size of the knapsack. Here the term O∗ hides a polynomial of \(\log K\) and ε− 1. Our streaming algorithms can also be used as fast approximation algorithms. In particular, for the cardinality-constrained problem, our algorithm takes \(O(n\varepsilon ^{-1} \log (\varepsilon ^{-1}\log K) )\) time, improving on the algorithm of Badanidiyuru and Vondrák that takes \(O(n \varepsilon ^{-1} \log (\varepsilon ^{-1} K) )\) time.
Similar content being viewed by others
References
Ahn, K.J., Guha, S.: Linear programming in the semi-streaming model with application to the maximum matching problem. Inform. Comput. 222, 59–79 (2013). 38th International Colloquium on Automata Languages and Programming (ICALP 2011)
Alaluf, N., Ene, A., Feldman, M., Nguyen, H.L., Suh, A.: Optimal streaming algorithms for submodular maximization with cardinality constraints. In: Czumaj, A., Dawar, A., Merelli, E. (eds.) 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.ICALP.2020.6, vol. 168, pp 6:1–6:19 (2020)
Alon, N., Gamzu, I., Tennenholtz, M.: Optimizing budget allocation among channels and influencers. In: Proceedings of the 21st international conference on world wide web (WWW), pp 381–388 (2012)
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 671–680 (2014)
Badanidiyuru, A., Vondrák, J.: Fast algorithms for maximizing submodular functions. In: Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 1497–1514 (2013)
Balkanski, E., Rubinstein, A., Singer, Y.: An exponential speedup in parallel running time for submodular maximization without loss in approximation. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms SODA, pp 283–302 (2019)
Balkanski, E., Singer, Y.: The adaptive complexity of maximizing a submodular function. In: Proceedings of the 50th Annual ACM Symposium on Theory of Computing, STOC 2018, pp 1138–1151. ACM, New York (2018)
Barbosa, R.D.P., Ene, A., Nguyễn, H.L., Ward, J.: A new framework for distributed submodular maximization. In: 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp 645–654 (2016)
Bateni, M., Esfandiari, H., Mirrokni, V.: Almost optimal streaming algorithms for coverage problems. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp 13–23 (2017)
Calinescu, G., Chekuri, C., Pál, M., Vondrák, J.: Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput. 40(6), 1740–1766 (2011)
Chakrabarti, A., Kale, S.: Submodular maximization meets streaming: matchings, matroids, and more. Math. Program. 154(1-2), 225–247 (2015)
Chan, T.H.H., Huang, Z., Jiang, S.H.C., Kang, N., Tang, Z.G.: Online submodular maximization with free disposal: Randomization beats for partition matroids online. In: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 1204–1223 (2017)
Chan, T.H.H., Jiang, S.H.C., Tang, Z.G., Wu, X.: Online Submodular Maximization Problem with Vector Packing Constraint. In: Annual European Symposium on Algorithms (ESA), pp 24:1–24:14 (2017)
Chekuri, C., Gupta, S., Quanrud, K.: Streaming algorithms for submodular function maximization. In: Proceedings of the 42nd International Colloquium on Automata, Languages, and Programming (ICALP), vol. 9134, pp 318–330 (2015)
Chekuri, C., Quanrud, K.: Submodular function maximization in parallel via the multilinear relaxation. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 303–322 (2019)
Chekuri, C., Vondrák, J., Zenklusen, R.: Submodular function maximization via the multilinear relaxation and contention resolution schemes. SIAM J. Comput. 43(6), 1831–1879 (2014)
Ene, A., Nguyễn, H.L.: A nearly-linear time algorithm for submodular maximization with a knapsack constraint. In: Baier, C., Chatzigiannakis, I., Flocchini, P., Leonardi, S. (eds.) 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), LIPIcs, vol. 132, pp 53:1–53:12 (2019)
Ene, A., Nguyễn, H.L.: Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. In: Proceedings of the 30th annual ACM-SIAM symposium on discrete algorithms (SODA), pp 274–282 (2019)
Feldman, M., Karbasi, A., Kazemi, E.: Do less, get more: streaming submodular maximization with subsampling. In: Annual Conference on Neural Information Processing Systems (NeurIPS), pp 730–740 (2018)
Feldman, M., Norouzi-Fard, A., Svensson, O., Zenklusen, R.: The one-way communication complexity of submodular maximization with applications to streaming and robustness. In: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pp 1363–1374 (2020)
Filmus, Y., Ward, J.: A tight combinatorial algorithm for submodular maximization subject to a matroid constraint. SIAM J. Comput. 43(2), 514–542 (2014)
Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions i. Math. Program., pp. 265–294 (1978)
Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions ii. Math Prog Study 8, 73–87 (1978)
Haba, R., Kazemi, E., Feldman, M., Karbasi, A.: Streaming submodular maximization under a k-set system constraint. In: H.D. III, Singh, A (eds.) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v119/haba20a.html, vol. 119, pp 3939–3949 (2020)
Huang, C., Kakimura, N.: Improved streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 83 (3), 879–902 (2021). https://doi.org/10.1007/s00453-020-00786-4
Huang, C.C., Kakimura, N., Mauras, S., Yoshida, Y.: Approximability of Monotone Submodular Function Maximization under Cardinality and Matroid Constraints in the Streaming Model, SIAM Journal on Discrete Mathematics, to appear (2021)
Huang, C.C., Kakimura, N., Yoshida, Y.: Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82(4), 1006–1032 (2020)
Kale, S., Tirodkar, S.: Maximum matching in two, three, and a few more passes over graph streams. In: The 20th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX2017) (2017)
Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi, S., Karbasi, A.: Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In: Proceedings of the 36th International Conference on Machine Learning (ICML), pp 3311–3320 (2019)
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp 137–146 (2003)
Krause, A., Singh, A.P., Guestrin, C.: Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. J. Mach. Learn. Res. 9, 235–284 (2008)
Kulik, A., Shachnai, H., Tamir, T.: Maximizing submodular set functions subject to multiple linear constraints. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp 545–554 (2013)
Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. ACM Trans. Parallel Comput. 2(3), 14:1–14:22 (2015)
Lee, J.: Maximum entropy sampling, encyclopedia of environmetrics, vol. 3, pp 1229–1234. Wiley, New York (2006)
Lee, J., Sviridenko, M., Vondrák, J.: Submodular maximization over multiple matroids via generalized exchange properties. Math. Oper. Res. 35(4), 795–806 (2010)
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language technologies (NAACL-HLT), pp 912–920 (2010)
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pp 510–520 (2011)
McGregor, A.: Finding graph matchings in data streams. In: Proceedings of the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems, APPROX05, pp 170–181 (2005)
McGregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014)
McGregor, A., Vu, H.T.: Better streaming algorithms for the maximum coverage problem. In: International Conference on Database Theory (ICDT) (2017)
Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondrák, J., Krause, A.: Lazier than lazy greedy. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI2015, pp 1812–1818. AAAI Press (2015)
Mirzasoleiman, B., Jegelka, S., Krause, A.: Streaming non-monotone submodular maximization: personalized video summarization on the fly. In: Proc. Conference on Artificial Intelligence (AAAI) (2018)
Norouzi-Fard, A., Tarnawski, J., Mitrovic, S., Zandieh, A., Mousavifar, A., Svensson, O.: Beyond 1/2-approximation for submodular maximization on massive data streams. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp 3826–3835 (2018)
Nutov, Z., Shoham, E.: Practical budgeted submodular maximization (2020)
da Ponte Barbosa, R., Ene, A., Nguyễn, H.L., Ward, J.: The power of randomization: Distributed submodular maximization on massive datasets. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR workshop and conference proceedings. JMLR.org. http://proceedings.mlr.press/v37/barbosa15.html, vol. 37, pp 1236–1244 (2015)
Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.: Optimal budget allocation: Theoretical guarantee and efficient algorithm. In: Proceedings of the 31st International Conference on Machine Learning (ICML), pp 351–359 (2014)
Sviridenko, M.: A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)
Wolsey, L.: Maximising real-valued submodular functions: primal and dual heuristics for location problems. Math. Oper. Res. (1982)
Yaroslavtsev, G., Zhou, S., Avdiukhin, D.: “Bring Your Own Greedy”+Max: near-optimal 1/2-approximations for submodular knapsack. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp 3263–3274 (2020)
Yoshida, Y.: Maximizing a monotone submodular function with a bounded curvature under a knapsack constraint. SIAM J. Discrete Math. 33(3), 1452–1471 (2019)
Yu, Q., Xu, E.L., Cui, S.: Streaming algorithms for news and scientific literature recommendation: Submodular maximization with a d-knapsack constraint. IEEE Global Conf Signal Inf Process (2016)
Acknowledgements
The authors thank the referees for their valuable comments on this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The first author was supported by French National Research Agency (ANR), with grant ANR-19-CE48-0016 and ANR-18-CE40-0025-01.The second author was supported by JSPS KAKENHI Grant Numbers JP17K00028, JP18H05291, 20K21834, and 20H05795.
Appendix A: Proof of Lemma 7
Appendix A: Proof of Lemma 7
We discuss how to obtain a (0.5 − O(ε))-approximation when c1 + c2 is almost 1.
Lemma 22
Suppose that \(f(o_{1}+o_{2})\geq v^{\prime }\). We can find a set S using two passes and O(ε− 1K) space such that |S| = 2 and \(f(S)\geq \left (\frac {2}{3}-\varepsilon \right )v^{\prime }\).
We begin by reviewing the algorithmFootnote 3 in [27].
Theorem 8
Let \(E_{\mathrm {R}} \subseteq E\) be a subset of the ground set (and we call ER red items). Let \(X\subseteq E\) such that v ≤ f(X) ≤ (1 + ε)v. Assume that there exists x ∈ X ∩ ER such that \(\underline {\tau } v \leq f(x) \leq \overline {\tau }v\). Then we can find a set \(Y \subseteq E_{\mathrm {R}}\) of red items, in one pass and O(n) time, with \(|Y| = O(\log _{1+\varepsilon } \frac {\overline {\tau }}{ \underline {\tau }})\) such that some item e∗ in Y satisfies f(X − x + e∗) ≥ (2/3 − O(ε))v.
Proof of Lemma 22
For each \(t=1,2,\dots , K/2\), define Et = {e ∈ E∣t ≤ c(e) ≤ K − t} as the red items. The critical thing to observe is that, if t ≤ c(o2), we see o1 ∈ Et.
The above observation suggests the following implementation. In the first pass, for each set Et, apply Theorem 8 to collect a set \(X_{t} \subseteq E_{t}\) (apparently we can set \(\overline {\tau }= 2/3\) and \(\underline {\tau }=1/3\)). Since \(|X_{t}|=O(\log _{1+\varepsilon }2)=O(\varepsilon ^{-1})\), it takes O(ε− 1K) space and O(n) time in total. Then it follows from Theorem 8 that, for each t with t ≤ c(o2), there exists e∗∈ Xt such that \(f(o_{2}+e^{\ast })\geq (2/3- O(\varepsilon )) v^{\prime }\) and c(e∗) ≤ K − c(o2). In the second pass, for each item e in E, check whether there exists \(e^{\prime }\) in Xc(e) such that \(c(e+e^{\prime })\leq K\) and \(f(e + e^{\prime }) \geq (2/3-O(\varepsilon ))v^{\prime }\). It follows that there exists at least one pair of e and \(e^{\prime }\) satisfying the condition. The second pass also takes O(ε− 1K) space as we keep Xt’s. Since |Xt| = O(ε− 1), the second phase takes O(ε− 1n) time. □
Suppose that v ≤ f(OPT) ≤ (1 + ε)v. If f(o1 + o2) ≥ 0.75v, then we are done using Lemma 22. So assume otherwise, meaning that f(OPT − o1 − o2) ≥ 0.25v. Notice that we can also assume that f(OPT − o1) ≥ 0.5v. Now consider two possibilities.
Claim
If \(c_{1} \geq 1- \sqrt {\varepsilon }\), then we can find a set S in O(ε− 1) passes and O(K) space such that c(S) ≤ K and f(S) ≥ (0.5 − O(ε))v.
Proof
Since \(c_{1} \geq 1- \sqrt {\varepsilon }\), we have \(c(\text {OPT} - o_{1}) \leq \sqrt {\varepsilon } K\). Consider the problem (1) to approximate OPT − o1. Then the largest item in OPT − o1 is c(o2) which is at most \(\sqrt {\varepsilon }K\). By Corollary 1, Simple\((\mathcal {I};0.5v, \sqrt {\varepsilon }K)\) can obtain a set S satisfying that
where the last inequality follows because \(e^{-\frac {1 - \sqrt {\varepsilon } }{\sqrt {\varepsilon }}} \leq \varepsilon \) when ε ≤ 1. □
Claim
If \(c_{1} < 1- \sqrt {\varepsilon }\), then we can find a set S in O(ε− 1) passes and O(K) space such that c(S) ≤ K and f(S) ≥ (0.5 − O(ε))v.
Proof
Consider the problem:
to approximate OPT − o1 − o1. Let \(\mathcal {I}^{\prime }\) be the corresponding instance. Since f(OPT − o1 − o2) ≥ 0.25v and c(OPT − o1 − o2) ≤ εK, Corollary 1 implies that Simple\((\mathcal {I}^{\prime };\) 0.25v,εK) can obtain a set Y satisfying that
since the largest item in OPT − o1 − o2 has size at most ε. After taking the set Y, we still have space for packing either o1 or o2, since \(c(Y)\leq \sqrt {\varepsilon } K <K-c(o_{1})\).
Define g := f(⋅∣Y ). If some element e satisfies c(Y ) + c(e) ≤ K and f(Y + e) ≥ 0.5v, then we are done. Thus we may assume that no such element exists, implying that f(Y + oℓ) < 0.5v for ℓ = 1, 2. Hence it holds that
This implies that
Consider the problem:
to approximate OPT − o1 − o1. Denote by \(\mathcal {I}^{\prime \prime }\) the corresponding instance. Since \(K-c(Y)\geq (1-\sqrt {\varepsilon })K\) and g(OPT − o1 − o2) ≥ (0.25 − O(ε))v, Corollary 1 implies that Simple\((\mathcal {I}^{\prime \prime }; (0.25 - O(\varepsilon ))v, \varepsilon K)\) can obtain a set S satisfying that
Therefore, Y ∪ S satisfies that c(Y ∪ S) ≤ K and
□
For a given v, the above can be done in O(ε− 1K) space using O(ε− 1) passes. The total running time is O(nε− 1). This completes the proof of Lemma 7.
Rights and permissions
About this article
Cite this article
Huang, CC., Kakimura, N. Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization. Theory Comput Syst 66, 354–394 (2022). https://doi.org/10.1007/s00224-021-10065-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-021-10065-6